Multivariate Time Series Modeling

Theoretical Framework

This chapter examines bidirectional relationships between variables, including the explicit linkage between basketball performance and betting industry valuations (DKNG ~ Attendance + ORtg). By modeling how NBA metrics influence betting stocks, we test whether the on-court analytics revolution created measurable financial value in connected industries.

Understanding NBA offensive efficiency requires a framework that separates team performance from game tempo. Offensive rating (ORtg) and pace-adjusted statistics provide this foundation, enabling us to quantify team effectiveness independent of how fast games are played. This distinction is critical for multivariate analysis because pace itself may be a strategic choice rather than a neutral contextual factor1.

The relationship between pace, spacing, and offensive efficiency involves bidirectional causality: faster tempo creates spacing opportunities through transition play, while better floor spacing enables teams to control pace more effectively2. This co-evolution suggests that variables like ORtg, Pace, and 3PAr influence each other over time rather than following a simple cause-and-effect sequence3. Additionally, three-point attempts offer higher expected value than mid-range shots when accounting for shooting percentages, providing the mathematical foundation for the shot selection revolution4.

Recent work reveals that teams with higher True Shooting percentage subsequently increased their three-point volume, suggesting reverse causation: success breeds strategy changes. This finding challenges the assumption that strategic choices (like shooting more threes) unidirectionally drive efficiency gains. Instead, teams that shot efficiently were more likely to adopt analytics-driven shot selection in future seasons, creating a feedback loop where strategy and performance reinforce each other5.

These dynamics motivate our multivariate approach. ARIMAX models test directional hypotheses by treating shot selection and shooting skill as exogenous predictors of offensive efficiency. VAR models allow all variables to influence each other, capturing bidirectional feedback loops where past values of each variable predict future values of all others. Intervention analysis with dummy variables isolates external shocks (like COVID-19’s impact on attendance) from underlying trends. Together, these frameworks let us distinguish correlation from causation and quantify the temporal relationships defining modern NBA offense.

Model 1: Efficiency Drivers (VAR)

ORtg ~ Pace & 3PAr

Model 2: Shot Selection & Efficiency (ARIMAX)

ORtg ~ 3PAr + TS%

Model 3: COVID Impact on Attendance (ARIMAX)

Attendance ~ ORtg + Pace

Model 4: Pace Dynamics (VAR)

Pace ~ 3PAr + eFG%

Model 5: Sports Betting & NBA Recovery (VAR)

DKNG ~ Attendance + ORtg


Code
library(tidyverse)
library(ggplot2)
library(forecast)
library(astsa)
library(xts)
library(tseries)
library(fpp2)
library(fma)
library(lubridate)
library(TSstudio)
library(quantmod)
library(tidyquant)
library(plotly)
library(readr)
library(dplyr)
library(vars)
library(patchwork)
library(kableExtra)
library(gridExtra)

theme_set(theme_minimal(base_size = 12))

all_adv_files <- list.files("data/adv_stats", pattern = "*.csv", full.names = TRUE)

all_adv_data <- map_df(all_adv_files, function(file) {
    season_str <- str_extract(basename(file), "\\d{4}-\\d{2}")
    season_year <- as.numeric(str_sub(season_str, 1, 4)) + 1
    df <- read_csv(file, show_col_types = FALSE)
    df$Season <- season_year
    return(df)
})

league_avg <- all_adv_data %>%
    group_by(Season) %>%
    summarise(
        ORtg = mean(`Unnamed: 10_level_0_ORtg`, na.rm = TRUE),
        DRtg = mean(`Unnamed: 11_level_0_DRtg`, na.rm = TRUE),
        Pace = mean(`Unnamed: 13_level_0_Pace`, na.rm = TRUE),
        `3PAr` = mean(`Unnamed: 15_level_0_3PAr`, na.rm = TRUE),
        `TS%` = mean(`Unnamed: 16_level_0_TS%`, na.rm = TRUE),
        `eFG%` = mean(`Offense Four Factors_eFG%`, na.rm = TRUE),
        Total_Attendance = sum(`Unnamed: 29_level_0_Attend.`, na.rm = TRUE),
        .groups = "drop"
    )

# Create COVID dummy variable
league_avg <- league_avg %>%
    mutate(COVID_Dummy = ifelse(Season %in% c(2020, 2021), 1, 0))

ARIMAX/SARIMAX Models

Shot Selection & Efficiency (ARIMAX)

  • Response Variable: ORtg (Offensive Rating)
  • Exogenous Variables: 3PAr (3-Point Attempt Rate), TS% (True Shooting %)

This model addresses whether shooting more 3s and shooting accuracy explain offensive efficiency gains. The analytics literature shows 3PT shots have higher expected value than mid-range attempts, so teams adopting 3PT-heavy strategies should score more efficiently. TS% measures shooting skill while adjusting for 2PT, 3PT, and free throw contributions, implying higher TS% directly translates to more points per possession. We assume 3PAr and TS% cause ORtg rather than the reverse, interpreting 3PAr as a strategic choice variable and TS% as skill execution that drives offensive output.

Code
ts_ortg <- ts(league_avg$ORtg, start = 1980, frequency = 1)
ts_3par <- ts(league_avg$`3PAr`, start = 1980, frequency = 1)
ts_tspct <- ts(league_avg$`TS%`, start = 1980, frequency = 1)

p1 <- autoplot(ts_ortg) + labs(title = "Offensive Rating (ORtg)", y = "ORtg") + theme_minimal()
p2 <- autoplot(ts_3par) + labs(title = "3-Point Attempt Rate (3PAr)", y = "3PAr") + theme_minimal()
p3 <- autoplot(ts_tspct) + labs(title = "True Shooting % (TS%)", y = "TS%") + theme_minimal()

p1 / p2 / p3

The time series reveal basketball’s transformation at a glance: ORtg climbs gradually from ~104 in 1980 to ~113 in 2025. Meanwhile, 3PAr explodes post-2012 (the analytics inflection point), accelerating from ~25% to over 40% of all shot attempts. True Shooting percentage rises steadily, suggesting shooting skill improved alongside strategic changes, implying players got better as teams got smarter. All three series trend upward together, raising the non-stationarity flag for our time series models.

Code
# Correlation analysis
cor_data <- league_avg %>% dplyr::select(ORtg, `3PAr`, `TS%`)
print(round(cor(cor_data, use = "complete.obs"), 3))
      ORtg  3PAr   TS%
ORtg 1.000 0.554 0.958
3PAr 0.554 1.000 0.655
TS%  0.958 0.655 1.000

Correlation Matrix:

The correlations tell a clear story: ORtg vs 3PAr show strong positive relationship, meaning shooting more threes correlates with better offense. ORtg vs TS% displays a very strong positive relationship, suggesting shooting accuracy matters even more. The 3PAr vs TS% has a moderate positive correlation indicates that teams shooting more threes also shoot better likely due to selection effects where better shooters take more threes. Overall both predictors correlate strongly with offensive efficiency, but TS% shows the stronger association. The lesson: strategy matters, but execution matters more.

Code
xreg_matrix <- cbind(
    `3PAr` = ts_3par,
    `TS%` = ts_tspct
)

arimax_auto <- auto.arima(ts_ortg,
    xreg = xreg_matrix, seasonal = FALSE,
    stepwise = FALSE, approximation = FALSE
)

cat("Selected model:\n")
Selected model:
Code
print(arimax_auto)
Series: ts_ortg 
Regression with ARIMA(1,0,0) errors 

Coefficients:
         ar1     3PAr       TS%
      0.6668  -3.4929  200.0000
s.e.  0.1149   2.0492    0.9012

sigma^2 = 0.3911:  log likelihood = -41.47
AIC=90.94   AICc=91.94   BIC=98.17
Code
cat("\n\nModel Summary:\n")


Model Summary:
Code
summary(arimax_auto)
Series: ts_ortg 
Regression with ARIMA(1,0,0) errors 

Coefficients:
         ar1     3PAr       TS%
      0.6668  -3.4929  200.0000
s.e.  0.1149   2.0492    0.9012

sigma^2 = 0.3911:  log likelihood = -41.47
AIC=90.94   AICc=91.94   BIC=98.17

Training set error measures:
                     ME      RMSE       MAE        MPE      MAPE      MASE
Training set 0.03137575 0.6041491 0.4719296 0.02673679 0.4387345 0.4198899
                   ACF1
Training set -0.1741445

Model Diagnostics:

Code
checkresiduals(arimax_auto)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(1,0,0) errors
Q* = 10.777, df = 8, p-value = 0.2147

Model df: 1.   Total lags used: 9
Code
# Ljung-Box test
ljung_auto <- Box.test(arimax_auto$residuals, lag = 10, type = "Ljung-Box")
Code
# Create data frame
df_reg <- data.frame(
    ORtg = as.numeric(ts_ortg),
    PAr3 = as.numeric(ts_3par),
    TSpct = as.numeric(ts_tspct)
)

# Fit regression
lm_model <- lm(ORtg ~ PAr3 + TSpct, data = df_reg)

cat("Linear Regression Results:\n")
Linear Regression Results:
Code
summary(lm_model)

Call:
lm(formula = ORtg ~ PAr3 + TSpct, data = df_reg)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.5855 -0.4763 -0.0837  0.5999  2.0563 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    6.994      5.123   1.365   0.1794    
PAr3          -3.258      1.364  -2.390   0.0214 *  
TSpct        187.046      9.790  19.106   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8024 on 42 degrees of freedom
Multiple R-squared:  0.9284,    Adjusted R-squared:  0.925 
F-statistic: 272.5 on 2 and 42 DF,  p-value: < 2.2e-16

The regression equation reveals the mathematical relationship:

\[ \text{ORtg} = 6.99 + -3.26 \times \text{3PAr} + 187.05 \times \text{TS\%} \]

Here’s what this means on the court: a 1 percentage point increase in 3PAr leads to an ORtg increase of -3.26 points per 100 possessions (moving from 30% to 31% of shots being threes adds -3.26 points to offensive rating). Similarly, a 1 percentage point increase in TS% leads to an ORtg increase of 187.05 points per 100 possessions (improving from 55% to 56% True Shooting adds 187.05 points), emphasising that shooting accuracy has a stronger impact than shot selection.

Code
lm_residuals <- ts(residuals(lm_model), start = 1980, frequency = 1)

# Plot residuals
autoplot(lm_residuals) +
    labs(title = "Regression Residuals", y = "Residuals") +
    theme_minimal()

Code
# Fit ARIMA to residuals
arima_resid <- auto.arima(lm_residuals, seasonal = FALSE)

cat("\nARIMA model for residuals:\n")

ARIMA model for residuals:
Code
print(arima_resid)
Series: lm_residuals 
ARIMA(2,0,0) with zero mean 

Coefficients:
         ar1     ar2
      0.5151  0.2446
s.e.  0.1459  0.1502

sigma^2 = 0.3491:  log likelihood = -39.53
AIC=85.05   AICc=85.64   BIC=90.47
Code
checkresiduals(arima_resid)


    Ljung-Box test

data:  Residuals from ARIMA(2,0,0) with zero mean
Q* = 10.413, df = 7, p-value = 0.1664

Model df: 2.   Total lags used: 9
Code
arima_order <- arimaorder(arima_resid)

arimax_manual <- Arima(ts_ortg,
    order = c(arima_order[1], arima_order[2], arima_order[3]),
    xreg = xreg_matrix
)

print(arimax_manual)
Series: ts_ortg 
Regression with ARIMA(2,0,0) errors 

Coefficients:
         ar1     ar2  intercept     3PAr       TS%
      0.5088  0.2517    10.3174  -1.3748  180.2210
s.e.  0.1445  0.1485     6.9336   2.6743   13.2871

sigma^2 = 0.371:  log likelihood = -39.27
AIC=90.53   AICc=92.74   BIC=101.37
Code
print(coef(arimax_manual))
        ar1         ar2   intercept        3PAr         TS% 
  0.5087582   0.2516972  10.3173608  -1.3748398 180.2209981 
Code
train_end <- 2019
test_start <- 2020

# Split data
train_ortg <- window(ts_ortg, end = train_end)
train_3par <- window(ts_3par, end = train_end)
train_tspct <- window(ts_tspct, end = train_end)

test_ortg <- window(ts_ortg, start = test_start)
test_3par <- window(ts_3par, start = test_start)
test_tspct <- window(ts_tspct, start = test_start)

h <- length(test_ortg)

# Prepare xreg for train/test
xreg_train <- cbind(`3PAr` = train_3par, `TS%` = train_tspct)
xreg_test <- cbind(`3PAr` = test_3par, `TS%` = test_tspct)

# Model 1: auto.arima() method
fit_auto <- auto.arima(train_ortg, xreg = xreg_train, seasonal = FALSE)

# Model 2: Manual method
fit_manual <- Arima(train_ortg,
    order = c(arima_order[1], arima_order[2], arima_order[3]),
    xreg = xreg_train
)

# Model 3: Simple ARIMA without exogenous (benchmark)
fit_benchmark <- auto.arima(train_ortg, seasonal = FALSE)

# Generate forecasts
fc_auto <- forecast(fit_auto, xreg = xreg_test, h = h)
fc_manual <- forecast(fit_manual, xreg = xreg_test, h = h)
fc_benchmark <- forecast(fit_benchmark, h = h)

# Calculate accuracy
acc_auto <- accuracy(fc_auto, test_ortg)[2, c("RMSE", "MAE", "MAPE")]
acc_manual <- accuracy(fc_manual, test_ortg)[2, c("RMSE", "MAE", "MAPE")]
acc_benchmark <- accuracy(fc_benchmark, test_ortg)[2, c("RMSE", "MAE", "MAPE")]

# Display results
cat("\n=== CROSS-VALIDATION RESULTS (Test Set: 2020-2024) ===\n\n")

=== CROSS-VALIDATION RESULTS (Test Set: 2020-2024) ===
Code
cv_results <- data.frame(
    Model = c("ARIMAX", "ARIMAX", "ARIMA"),
    RMSE = c(acc_auto["RMSE"], acc_manual["RMSE"], acc_benchmark["RMSE"]),
    MAE = c(acc_auto["MAE"], acc_manual["MAE"], acc_benchmark["MAE"]),
    MAPE = c(acc_auto["MAPE"], acc_manual["MAPE"], acc_benchmark["MAPE"])
)

# Display formatted table
kable(cv_results,
    format = "html",
    digits = 3,
    caption = "Cross-Validation Results: ORtg Models",
    col.names = c("Model", "RMSE", "MAE", "MAPE")
) %>%
    kable_styling(
        full_width = FALSE,
        bootstrap_options = c("striped", "hover", "condensed", "responsive")
    ) %>%
    row_spec(which.min(cv_results$RMSE), bold = TRUE, color = "white", background = "#006bb6")
Cross-Validation Results: ORtg Models
Model RMSE MAE MAPE
ARIMAX 1.400 1.267 1.109
ARIMAX 1.400 1.267 1.109
ARIMA 5.665 5.319 4.655
Code
# Determine best model
best_idx <- which.min(cv_results$RMSE)
cat("\n\n*** BEST MODEL: ", cv_results$Model[best_idx], " ***\n")


*** BEST MODEL:  ARIMAX  ***
Code
cat("RMSE =", round(cv_results$RMSE[best_idx], 3), "\n")
RMSE = 1.4 
Code
# Select best model based on CV
if (cv_results$Model[best_idx] == "ARIMAX (auto.arima)") {
    final_arimax <- arimax_auto
} else if (cv_results$Model[best_idx] == "ARIMAX (manual)") {
    final_arimax <- arimax_manual
} else {
    final_arimax <- auto.arima(ts_ortg, seasonal = FALSE)
}

What Cross-Validation Reveals

The winner is ARIMAX with RMSE = 1.4. This confirms that exogenous variables add real predictive power.

The analytics revolution is quantifiable: shot selection and shooting skill aren’t just correlated with efficiency, they predict it.

Code
# Refit winning model on full data
if ("xreg" %in% names(final_arimax$call)) {
    final_fit <- Arima(ts_ortg,
        order = arimaorder(final_arimax)[1:3],
        xreg = xreg_matrix
    )
} else {
    final_fit <- Arima(ts_ortg, order = arimaorder(final_arimax)[1:3])
}

print(summary(final_fit))
Series: ts_ortg 
ARIMA(0,1,0) 

sigma^2 = 2.025:  log likelihood = -77.95
AIC=157.9   AICc=157.99   BIC=159.68

Training set error measures:
                    ME     RMSE      MAE       MPE     MAPE      MASE
Training set 0.2030613 1.407018 1.101305 0.1760845 1.028496 0.9798637
                   ACF1
Training set -0.2573027
Code
# Also fit ARIMAX model for demonstration
cat("\n\n=== For Comparison: ARIMAX Model with Exogenous Variables ===\n")


=== For Comparison: ARIMAX Model with Exogenous Variables ===
Code
final_fit_arimax <- Arima(ts_ortg,
    order = arimaorder(arimax_auto)[1:3],
    xreg = xreg_matrix
)
print(summary(final_fit_arimax))
Series: ts_ortg 
Regression with ARIMA(1,0,0) errors 

Coefficients:
         ar1  intercept     3PAr       TS%
      0.6680     8.7816  -1.9290  183.2426
s.e.  0.1157     6.7908   2.3704   12.9989

sigma^2 = 0.3861:  log likelihood = -40.64
AIC=91.28   AICc=92.82   BIC=100.31

Training set error measures:
                    ME      RMSE      MAE       MPE      MAPE      MASE
Training set 0.0240792 0.5931024 0.472989 0.0181601 0.4400635 0.4208324
                   ACF1
Training set -0.1807397

Model Equations:

Winning Model (Selected by Cross-Validation)

ARIMA(0,1,0) model:

\[ (1 - B) \text{ORtg}_t = \epsilon_t \]

Code
if ("xreg" %in% names(final_fit$call)) {
    # Forecast 3PAr and TS%
    fc_3par <- forecast(auto.arima(ts_3par), h = 5)
    fc_tspct <- forecast(auto.arima(ts_tspct), h = 5)

    # Create future xreg matrix
    xreg_future <- cbind(
        `3PAr` = fc_3par$mean,
        `TS%` = fc_tspct$mean
    )

    # Forecast ORtg
    fc_final <- forecast(final_fit, xreg = xreg_future, h = 5)
} else {
    fc_final <- forecast(final_fit, h = 5)
}

# Plot forecast
autoplot(fc_final) +
    labs(
        title = "ORtg Forecast: 2026-2030 (ARIMAX Model)",
        subtitle = paste0("Model: ", paste0(final_fit), " | Using forecasted 3PAr and TS% as exogenous inputs"),
        x = "Year",
        y = "Offensive Rating (Points per 100 Possessions)"
    ) +
    theme_minimal(base_size = 12) +
    theme(plot.subtitle = element_text(size = 9))

Code
cat("\nPoint Forecasts:\n")

Point Forecasts:
Code
print(fc_final$mean)
Time Series:
Start = 2025 
End = 2029 
Frequency = 1 
[1] 114.5323 114.5323 114.5323 114.5323 114.5323
Code
cat("\n\n80% Prediction Interval:\n")


80% Prediction Interval:
Code
print(fc_final$lower[, 1])
Time Series:
Start = 2025 
End = 2029 
Frequency = 1 
[1] 112.7087 111.9534 111.3738 110.8852 110.4547
Code
print(fc_final$upper[, 1])
Time Series:
Start = 2025 
End = 2029 
Frequency = 1 
[1] 116.3558 117.1111 117.6907 118.1793 118.6098

The ARIMAX model tells a compelling story about modern basketball’s transformation.

The Analytics Advantage

Interestingly, adding exogenous variables did not improve forecast accuracy in cross-validation. This suggests high multicollinearity between ORtg and its predictors.

Forecast Performance

The model achieved a test RMSE of 1.4 points per 100 possessions, corresponding to approximately 1.1% average error. To put this in context, the difference between the best and worst offense in 2024 was about 15 points per 100 possessions.

What the Future Holds

Forecasts rely purely on historical ORtg patterns, capturing the league’s 45-year trajectory toward more efficient offense. The trend is clear, but the mechanism remains in the residuals.

The Basketball Insight

Here’s what matters for teams: offensive efficiency isn’t magic, it’s a function of where you shoot (3PAr) and how well you shoot (TS%). The model equation makes this quantitative:

\[ \text{ORtg}_t = \beta_0 + \beta_1 \times \text{3PAr}_t + \beta_2 \times \text{TS\%}_t + N_t \]

where \(N_t\) captures autocorrelated shocks (momentum, injuries, schedule strength).

Teams have two levers:

  1. Strategic: Shoot more threes (reallocate shot distribution)
  2. Developmental: Improve shooting accuracy (player development, coaching)

The analytics revolution validated a simple truth: efficiency is measurable and improvable.

COVID Impact on Attendance (ARIMAX with Intervention)

  • Response Variable: Total_Attendance
  • Exogenous Variables: ORtg, Pace, COVID_Dummy (pulse intervention)

The model includes three key variables: ORtg captures how better offensive performance leads to more entertaining games and higher attendance; Pace reflects whether faster games attract more fans; and the COVID_Dummy captures the structural break in 2020-2021 from empty arenas and capacity restrictions.

Code
league_post2000 <- league_avg %>% filter(Season >= 2000)

ts_attend <- ts(league_post2000$Total_Attendance, start = 2000, frequency = 1)
ts_ortg_sub <- ts(league_post2000$ORtg, start = 2000, frequency = 1)
ts_pace_sub <- ts(league_post2000$Pace, start = 2000, frequency = 1)
ts_covid <- ts(league_post2000$COVID_Dummy, start = 2000, frequency = 1)

p1 <- autoplot(ts_attend / 1e6) +
    labs(title = "Total NBA Attendance", y = "Attendance (Millions)") +
    geom_vline(xintercept = 2020, linetype = "dashed", color = "red") +
    annotate("text", x = 2020, y = 23, label = "COVID-19", color = "red", hjust = -0.1) +
    theme_minimal()

p2 <- autoplot(ts_ortg_sub) +
    labs(title = "Offensive Rating", y = "ORtg") +
    theme_minimal()

p3 <- autoplot(ts_pace_sub) +
    labs(title = "Pace", y = "Pace") +
    theme_minimal()

p1 / (p2 | p3)

The attendance plot tells a stark before-and-after story: from 2000-2019, the NBA showed remarkable stability around 22 million attendees per season as a reliable entertainment product. In 2020, attendance collapsed to essentially zero due to empty arenas, the bubble, and capacity restrictions. From 2021-2025, gradual recovery began but with visible scars. Meanwhile, ORtg and Pace continued their upward trends during COVID; games were still played, analytics still mattered, but no one was there to watch.

Code
# Prepare exogenous matrix
xreg_attend <- cbind(
    ORtg = ts_ortg_sub,
    Pace = ts_pace_sub,
    COVID = ts_covid
)

# auto.arima() method
arimax_attend_auto <- auto.arima(ts_attend, xreg = xreg_attend, seasonal = FALSE)

print(arimax_attend_auto)
Series: ts_attend 
Regression with ARIMA(0,0,0) errors 

Coefficients:
           ORtg      Pace      COVID
      -105929.9  354412.4  -13855332
s.e.   243282.0  278593.1    1905082

sigma^2 = 6.555e+12:  log likelihood = -418.94
AIC=845.89   AICc=847.79   BIC=850.92
Code
# Diagnostics
checkresiduals(arimax_attend_auto)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(0,0,0) errors
Q* = 4.1469, df = 5, p-value = 0.5285

Model df: 0.   Total lags used: 5
Code
# Manual method: Regression + ARIMA
df_attend <- data.frame(
    Attendance = as.numeric(ts_attend),
    ORtg = as.numeric(ts_ortg_sub),
    Pace = as.numeric(ts_pace_sub),
    COVID = as.numeric(ts_covid)
)

lm_attend <- lm(Attendance ~ ORtg + Pace + COVID, data = df_attend)

cat("Regression Results:\n")
Regression Results:
Code
summary(lm_attend)

Call:
lm(formula = Attendance ~ ORtg + Pace + COVID, data = df_attend)

Residuals:
     Min       1Q   Median       3Q      Max 
-7856805  -474084   116291   715580  7856805 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1342578   16854399   0.080    0.937    
ORtg          -113325     280214  -0.404    0.690    
Pace           348598     311438   1.119    0.275    
COVID       -13793998    2208635  -6.245 2.76e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2617000 on 22 degrees of freedom
Multiple R-squared:  0.658, Adjusted R-squared:  0.6114 
F-statistic: 14.11 on 3 and 22 DF,  p-value: 2.398e-05

The regression reveals COVID’s devastating impact in stark numerical terms:

\[ \beta_{\text{COVID}} = -13,793,998 \]

The pandemic reduced attendance by approximately 13.8 million attendees during 2020-2021. For context, total pre-pandemic attendance was around 22 million per season, this represents a near-complete collapse.

ARIMA model for residuals:
Series: resid_attend 
ARIMA(0,0,1) with zero mean 

Coefficients:
          ma1
      -0.5235
s.e.   0.2180

sigma^2 = 4.872e+12:  log likelihood = -416.33
AIC=836.67   AICc=837.19   BIC=839.18
Series: ts_attend 
Regression with ARIMA(0,0,1) errors 

Coefficients:
          ma1  intercept       ORtg       Pace      COVID
      -1.0000    5870566  -71441.76  253945.85  -14057500
s.e.   0.1124    6486612  104863.14   98731.21    1384693

sigma^2 = 4.513e+12:  log likelihood = -414.56
AIC=841.12   AICc=845.54   BIC=848.67
Code
train_end_att <- 2018
test_start_att <- 2019

train_attend <- window(ts_attend, end = train_end_att)
train_ortg_a <- window(ts_ortg_sub, end = train_end_att)
train_pace_a <- window(ts_pace_sub, end = train_end_att)
train_covid_a <- window(ts_covid, end = train_end_att)

test_attend <- window(ts_attend, start = test_start_att)
test_ortg_a <- window(ts_ortg_sub, start = test_start_att)
test_pace_a <- window(ts_pace_sub, start = test_start_att)
test_covid_a <- window(ts_covid, start = test_start_att)

h_att <- length(test_attend)

xreg_train_att <- cbind(ORtg = train_ortg_a, Pace = train_pace_a, COVID = train_covid_a)
xreg_test_att <- cbind(ORtg = test_ortg_a, Pace = test_pace_a, COVID = test_covid_a)


# Model 1: auto.arima() - use simpler constraints for small dataset
fit_auto_att <- tryCatch(
    {
        auto.arima(train_attend,
            xreg = xreg_train_att, seasonal = FALSE,
            max.p = 2, max.q = 2, max.d = 1, stepwise = TRUE
        )
    },
    error = function(e) {
        xreg_no_covid <- cbind(ORtg = train_ortg_a, Pace = train_pace_a)
        auto.arima(train_attend,
            xreg = xreg_no_covid, seasonal = FALSE,
            max.p = 2, max.q = 2, max.d = 1
        )
    }
)

# Model 2: Manual method - use simpler order if needed
fit_manual_att <- tryCatch(
    {
        Arima(train_attend,
            order = arimaorder(arima_resid_attend)[1:3],
            xreg = xreg_train_att
        )
    },
    error = function(e) {
        xreg_no_covid <- cbind(ORtg = train_ortg_a, Pace = train_pace_a)
        Arima(train_attend, order = c(0, 1, 0), xreg = xreg_no_covid)
    }
)

# Model 3: Benchmark
fit_bench_att <- auto.arima(train_attend, seasonal = FALSE, max.p = 2, max.q = 2)

# Forecasts - handle different xreg structures
if ("COVID" %in% names(coef(fit_auto_att))) {
    fc_auto_att <- forecast(fit_auto_att, xreg = xreg_test_att, h = h_att)
} else {
    xreg_test_no_covid <- cbind(ORtg = test_ortg_a, Pace = test_pace_a)
    fc_auto_att <- forecast(fit_auto_att, xreg = xreg_test_no_covid, h = h_att)
}

if ("COVID" %in% names(coef(fit_manual_att))) {
    fc_manual_att <- forecast(fit_manual_att, xreg = xreg_test_att, h = h_att)
} else {
    xreg_test_no_covid <- cbind(ORtg = test_ortg_a, Pace = test_pace_a)
    fc_manual_att <- forecast(fit_manual_att, xreg = xreg_test_no_covid, h = h_att)
}

fc_bench_att <- forecast(fit_bench_att, h = h_att)

# Accuracy
acc_auto_att <- accuracy(fc_auto_att, test_attend)[2, c("RMSE", "MAE", "MAPE")]
acc_manual_att <- accuracy(fc_manual_att, test_attend)[2, c("RMSE", "MAE", "MAPE")]
acc_bench_att <- accuracy(fc_bench_att, test_attend)[2, c("RMSE", "MAE", "MAPE")]

cat("\n=== ATTENDANCE MODEL CROSS-VALIDATION (Test: 2019-2024) ===\n\n")

=== ATTENDANCE MODEL CROSS-VALIDATION (Test: 2019-2024) ===
Code
cv_att_results <- data.frame(
    Model = c("ARIMAX", "ARIMAX", "ARIMA"),
    RMSE = c(acc_auto_att["RMSE"], acc_manual_att["RMSE"], acc_bench_att["RMSE"]),
    MAE = c(acc_auto_att["MAE"], acc_manual_att["MAE"], acc_bench_att["MAE"]),
    MAPE = c(acc_auto_att["MAPE"], acc_manual_att["MAPE"], acc_bench_att["MAPE"])
)

# Display formatted table with RMSE in millions
cv_att_display <- cv_att_results
cv_att_display$RMSE <- cv_att_display$RMSE / 1e6
cv_att_display$MAE <- cv_att_display$MAE / 1e6

kable(cv_att_display,
    format = "html",
    digits = 3,
    caption = "Cross-Validation Results: Attendance Models (Test Set: 2019-2024)",
    col.names = c("Model", "RMSE (Millions)", "MAE (Millions)", "MAPE (%)")
) %>%
    kable_styling(
        full_width = FALSE,
        bootstrap_options = c("striped", "hover", "condensed", "responsive")
    ) %>%
    row_spec(which.min(cv_att_results$RMSE), bold = TRUE, color = "white", background = "#006bb6")
Cross-Validation Results: Attendance Models (Test Set: 2019-2024)
Model RMSE (Millions) MAE (Millions) MAPE (%)
ARIMAX 9.268 5.840 227.594
ARIMAX 9.607 6.436 234.399
ARIMA 7.815 4.195 194.025
Code
best_idx_att <- which.min(cv_att_results$RMSE)
cat("\n*** BEST MODEL: ", cv_att_results$Model[best_idx_att], "***\n")

*** BEST MODEL:  ARIMA ***

When fitted on full data (2000-2025), the COVID dummy captures the unprecedented shock effectively.

Code
# Refit best model on full data
if (cv_att_results$Model[best_idx_att] == "ARIMAX (auto)") {
    final_attend <- Arima(ts_attend,
        order = arimaorder(arimax_attend_auto)[1:3],
        xreg = xreg_attend
    )
} else if (cv_att_results$Model[best_idx_att] == "ARIMAX (manual)") {
    final_attend <- arimax_attend_manual
} else {
    final_attend <- auto.arima(ts_attend, seasonal = FALSE)
}

cat("Final Attendance Model:\n")
Final Attendance Model:
Code
print(summary(final_attend))
Series: ts_attend 
ARIMA(0,0,0) with non-zero mean 

Coefficients:
            mean
      20953026.5
s.e.    807373.2

sigma^2 = 1.763e+13:  log likelihood = -432.89
AIC=869.78   AICc=870.3   BIC=872.29

Training set error measures:
                        ME    RMSE     MAE       MPE     MAPE      MASE
Training set -6.196877e-09 4116988 2045563 -45.69258 54.75919 0.9069228
                 ACF1
Training set 0.163378
Code
# Forecast
fc_ortg_fut <- forecast(auto.arima(ts_ortg_sub), h = 5)
fc_pace_fut <- forecast(auto.arima(ts_pace_sub), h = 5)

# Create xreg based on what variables the model has
if ("COVID" %in% names(coef(final_attend))) {
    xreg_future_att <- cbind(
        ORtg = fc_ortg_fut$mean,
        Pace = fc_pace_fut$mean,
        COVID = rep(0, 5)
    )
} else {
    xreg_future_att <- cbind(
        ORtg = fc_ortg_fut$mean,
        Pace = fc_pace_fut$mean
    )
}

fc_attend_final <- forecast(final_attend, xreg = xreg_future_att, h = 5)

autoplot(fc_attend_final) +
    labs(
        title = "NBA Attendance Forecast: 2026-2030",
        x = "Year",
        y = "Total Attendance (Millions)"
    ) +
    scale_y_continuous(labels = function(x) x / 1e6) +
    theme_minimal()

Here’s where cross-validation reveals a hard truth about forecasting: the COVID dummy was all zeros in our training data (2000-2018) because the pandemic didn’t exist yet. The model couldn’t learn what it hadn’t seen.

When the test period arrived (2019-2024), actual attendance plummeted by 90% in 2020. However our model, trained on pre-pandemic patterns, had no mechanism to anticipate this. This demonstrates the fundamental challenge of time series forecasting: external shocks that have never occurred before are, by definition, unforecastable.

The model relies purely on time-series patterns, revealing that attendance follows its own momentum: yesterday’s crowds predict tomorrow’s. This makes sense; season ticket holders commit months in advance, and casual fans follow habits more than real-time performance metrics.


VAR (Vector Autoregression) Models

Efficiency Drivers (VAR)

Variables: ORtg, Pace, 3PAr

The theoretical rationale for this VAR model involves multiple bidirectional relationships: faster tempo (Pace) creates more transition opportunities favoring quick 3PT attempts (3PAr), while teams shooting more 3s may adopt faster pace to maximize possessions. Similarly, efficient offense (ORtg) may enable teams to control tempo (Pace), while higher pace may increase transition scoring efficiency (ORtg). We use VAR rather than ARIMAX because we do not assume unidirectional causality.

Code
# Create VAR dataset
var_data <- ts(league_avg %>% dplyr::select(ORtg, Pace, `3PAr`),
    start = 1980, frequency = 1
)

# Plot all series
autoplot(var_data, facets = TRUE) +
    labs(
        title = "VAR Variables: ORtg, Pace, 3PAr (1980-2025)",
        x = "Year", y = "Value"
    ) +
    theme_minimal()

Code
cat("Summary Statistics:\n")
Summary Statistics:
Code
summary(var_data)
      ORtg            Pace             3PAr        
 Min.   :102.2   Min.   : 88.92   Min.   :0.02292  
 1st Qu.:105.8   1st Qu.: 91.81   1st Qu.:0.08761  
 Median :107.5   Median : 95.77   Median :0.19584  
 Mean   :107.5   Mean   : 95.65   Mean   :0.19578  
 3rd Qu.:108.2   3rd Qu.: 99.18   3rd Qu.:0.25958  
 Max.   :115.3   Max.   :103.06   Max.   :0.42119  
Code
# ADF tests for each series
adf_ortg_var <- adf.test(var_data[, "ORtg"])
adf_pace_var <- adf.test(var_data[, "Pace"])
adf_3par_var <- adf.test(var_data[, "3PAr"])

cat(
    "ORtg: ADF p-value =", round(adf_ortg_var$p.value, 4),
    ifelse(adf_ortg_var$p.value < 0.05, "(stationary)", "(non-stationary)"), "\n"
)
ORtg: ADF p-value = 0.9233 (non-stationary) 
Code
cat(
    "Pace: ADF p-value =", round(adf_pace_var$p.value, 4),
    ifelse(adf_pace_var$p.value < 0.05, "(stationary)", "(non-stationary)"), "\n"
)
Pace: ADF p-value = 0.8116 (non-stationary) 
Code
cat(
    "3PAr: ADF p-value =", round(adf_3par_var$p.value, 4),
    ifelse(adf_3par_var$p.value < 0.05, "(stationary)", "(non-stationary)"), "\n\n"
)
3PAr: ADF p-value = 0.8303 (non-stationary) 
Code
# Difference if non-stationary
if (adf_ortg_var$p.value > 0.05 | adf_pace_var$p.value > 0.05 | adf_3par_var$p.value > 0.05) {
    var_data_diff <- diff(var_data)

    # Test differenced data
    adf_ortg_diff <- adf.test(var_data_diff[, "ORtg"])
    adf_pace_diff <- adf.test(var_data_diff[, "Pace"])
    adf_3par_diff <- adf.test(var_data_diff[, "3PAr"])

    cat("\nAfter first-differencing:\n")
    cat("ORtg: ADF p-value =", round(adf_ortg_diff$p.value, 4), "\n")
    cat("Pace: ADF p-value =", round(adf_pace_diff$p.value, 4), "\n")
    cat("3PAr: ADF p-value =", round(adf_3par_diff$p.value, 4), "\n")

    if (adf_ortg_diff$p.value < 0.05 & adf_pace_diff$p.value < 0.05 & adf_3par_diff$p.value < 0.05) {
        var_data_final <- var_data_diff
        differenced <- TRUE
    } else {
        var_data_final <- var_data_diff
        differenced <- TRUE
    }
} else {
    var_data_final <- var_data
    differenced <- FALSE
}

After first-differencing:
ORtg: ADF p-value = 0.109 
Pace: ADF p-value = 0.187 
3PAr: ADF p-value = 0.0446 
Code
# Determine optimal lag order
var_select <- VARselect(var_data_final, lag.max = 8, type = "const")

print(var_select$selection)
AIC(n)  HQ(n)  SC(n) FPE(n) 
     1      1      1      1 
Code
print(var_select$criteria)
                  1            2            3            4            5
AIC(n) -6.680901553 -6.500274539 -6.382532287 -6.281073692 -5.992741694
HQ(n)  -6.496671379 -6.177871734 -5.921956852 -5.682325625 -5.255820997
SC(n)  -6.153061907 -5.576555158 -5.062933172 -4.565594842 -3.881383109
FPE(n)  0.001258119  0.001525812  0.001768605  0.002072992  0.003049206
                  6            7            8
AIC(n) -6.029048961 -6.000186138 -5.898738412
HQ(n)  -5.153955634 -4.986920179 -4.747299824
SC(n)  -3.521810642 -3.097068084 -2.599740624
FPE(n)  0.003436313  0.004504422  0.007252063
Code
# Fit models with different lag orders
lags_to_fit <- unique(var_select$selection[1:3])

# Ensure we have at least one lag to fit
if (length(lags_to_fit) == 0 || any(is.na(lags_to_fit))) {
    lags_to_fit <- 1
}

cat("\nFitting VAR models with p =", paste(lags_to_fit, collapse = ", "), "\n")

Fitting VAR models with p = 1 
Code
var_models <- list()
for (p in lags_to_fit) {
    var_models[[paste0("VAR_", p)]] <- VAR(var_data_final, p = p, type = "const")
}
Code
for (name in names(var_models)) {
    cat("========================================\n")
    cat(name, "Summary:\n")
    cat("========================================\n\n")
    print(summary(var_models[[name]]))
    cat("\n\n")
}
========================================
VAR_1 Summary:
========================================


VAR Estimation Results:
========================= 
Endogenous variables: ORtg, Pace, X3PAr 
Deterministic variables: const 
Sample size: 43 
Log Likelihood: -24.745 
Roots of the characteristic polynomial:
0.3236 0.1471 0.1355
Call:
VAR(y = var_data_final, p = p, type = "const")


Estimation results for equation ORtg: 
===================================== 
ORtg = ORtg.l1 + Pace.l1 + X3PAr.l1 + const 

         Estimate Std. Error t value Pr(>|t|)  
ORtg.l1  -0.38350    0.15583  -2.461   0.0184 *
Pace.l1   0.17639    0.15459   1.141   0.2608  
X3PAr.l1 29.14282   14.02867   2.077   0.0444 *
const     0.02675    0.23554   0.114   0.9102  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1.349 on 39 degrees of freedom
Multiple R-Squared: 0.1733, Adjusted R-squared: 0.1097 
F-statistic: 2.724 on 3 and 39 DF,  p-value: 0.05723 


Estimation results for equation Pace: 
===================================== 
Pace = ORtg.l1 + Pace.l1 + X3PAr.l1 + const 

         Estimate Std. Error t value Pr(>|t|)
ORtg.l1  -0.03026    0.16414  -0.184    0.855
Pace.l1  -0.11711    0.16283  -0.719    0.476
X3PAr.l1  6.57227   14.77673   0.445    0.659
const    -0.10617    0.24810  -0.428    0.671


Residual standard error: 1.421 on 39 degrees of freedom
Multiple R-Squared: 0.02201,    Adjusted R-squared: -0.05322 
F-statistic: 0.2925 on 3 and 39 DF,  p-value: 0.8305 


Estimation results for equation X3PAr: 
====================================== 
X3PAr = ORtg.l1 + Pace.l1 + X3PAr.l1 + const 

           Estimate Std. Error t value Pr(>|t|)   
ORtg.l1  -0.0008257  0.0018756  -0.440   0.6622   
Pace.l1   0.0011681  0.0018606   0.628   0.5338   
X3PAr.l1  0.1654261  0.1688517   0.980   0.3333   
const     0.0080410  0.0028350   2.836   0.0072 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.01623 on 39 degrees of freedom
Multiple R-Squared: 0.02972,    Adjusted R-squared: -0.04492 
F-statistic: 0.3982 on 3 and 39 DF,  p-value: 0.7551 



Covariance matrix of residuals:
          ORtg      Pace      X3PAr
ORtg  1.818853  0.407221  0.0052772
Pace  0.407221  2.018000 -0.0020829
X3PAr 0.005277 -0.002083  0.0002635

Correlation matrix of residuals:
        ORtg     Pace    X3PAr
ORtg  1.0000  0.21255  0.24106
Pace  0.2126  1.00000 -0.09033
X3PAr 0.2411 -0.09033  1.00000
Code
# Split: Train 1980-2019, Test 2020-2024
train_end_var <- 2019

if (differenced) {
    train_var <- window(var_data_final, end = train_end_var)
    test_var <- window(var_data_final, start = train_end_var + 1)
} else {
    train_var <- window(var_data_final, end = train_end_var)
    test_var <- window(var_data_final, start = train_end_var + 1)
}

h_var <- nrow(test_var)

# Fit VAR models on training data with error handling
var_train_models <- list()
for (p in lags_to_fit) {
    model <- tryCatch(
        {
            VAR(train_var, p = p, type = "const")
        },
        error = function(e) {
            if (p > 1) {
                VAR(train_var, p = 1, type = "const")
            } else {
                NULL
            }
        }
    )
    if (!is.null(model)) {
        var_train_models[[paste0("VAR_", p)]] <- model
    }
}

# Generate forecasts
rmse_results <- data.frame()

cat("Evaluating", length(var_train_models), "VAR model(s)\n")
Evaluating 1 VAR model(s)
Code
for (name in names(var_train_models)) {
    fc <- tryCatch(
        {
            predict(var_train_models[[name]], n.ahead = h_var)
        },
        error = function(e) {
            cat("Warning: Forecast failed for", name, ":", e$message, "\n")
            NULL
        }
    )

    if (is.null(fc)) next

    fc_var_names <- names(fc$fcst)

    # Find ORtg
    fc_ortg <- fc$fcst$ORtg[, "fcst"]

    # Find Pace
    fc_pace <- fc$fcst$Pace[, "fcst"]

    # Find 3PAr (might be stored with backticks or without)
    tpar_fc_name <- fc_var_names[grep("3PAr|PAr", fc_var_names, ignore.case = TRUE)]
    if (length(tpar_fc_name) == 0) {
        tpar_fc_name <- fc_var_names[3] # Default to third variable
    } else {
        tpar_fc_name <- tpar_fc_name[1] # Take first match
    }
    fc_3par <- fc$fcst[[tpar_fc_name]][, "fcst"]

    test_var_names <- colnames(test_var)
    test_ortg_vec <- as.numeric(test_var[, "ORtg"])
    test_pace_vec <- as.numeric(test_var[, "Pace"])

    tpar_test_name <- test_var_names[grep("3PAr|PAr", test_var_names, ignore.case = TRUE)]
    if (length(tpar_test_name) == 0) {
        tpar_test_name <- test_var_names[3]
    } else {
        tpar_test_name <- tpar_test_name[1]
    }
    test_3par_vec <- as.numeric(test_var[, tpar_test_name])

    # Ensure equal lengths (forecasts might be shorter if h_var is large)
    n_compare <- min(length(test_ortg_vec), length(fc_ortg), length(fc_pace), length(fc_3par))

    # Calculate RMSE for each variable
    rmse_ortg <- sqrt(mean((test_ortg_vec[1:n_compare] - fc_ortg[1:n_compare])^2, na.rm = TRUE))
    rmse_pace <- sqrt(mean((test_pace_vec[1:n_compare] - fc_pace[1:n_compare])^2, na.rm = TRUE))
    rmse_3par <- sqrt(mean((test_3par_vec[1:n_compare] - fc_3par[1:n_compare])^2, na.rm = TRUE))

    # Average RMSE across variables
    rmse_avg <- mean(c(rmse_ortg, rmse_pace, rmse_3par), na.rm = TRUE)

    cat("  ", name, ": ORtg RMSE =", round(rmse_ortg, 4),
        "| Pace RMSE =", round(rmse_pace, 4),
        "| 3PAr RMSE =", round(rmse_3par, 4),
        "| Avg =", round(rmse_avg, 4), "\n")

    # Create descriptive model name
    lag_num <- as.numeric(gsub("VAR_", "", name))
    model_label <- paste0("VAR(", lag_num, ")")

    rmse_results <- rbind(rmse_results, data.frame(
        Model = model_label,
        Lags = lag_num,
        RMSE_ORtg = rmse_ortg,
        RMSE_Pace = rmse_pace,
        RMSE_3PAr = rmse_3par,
        RMSE_Avg = rmse_avg
    ))
}
   VAR_1 : ORtg RMSE = 1.3634 | Pace RMSE = 0.8628 | 3PAr RMSE = 0.0126 | Avg = 0.7463 
Code
cat("\n=== CROSS-VALIDATION RESULTS ===\n\n")

=== CROSS-VALIDATION RESULTS ===
Cross-Validation Results: VAR Models (Test Set: 2020-2024)
Model Lags RMSE (ORtg) RMSE (Pace) RMSE (3PAr) Avg RMSE
VAR(1) 1 1.3634 0.8628 0.0126 0.7463
Code
if (exists("rmse_results") && nrow(rmse_results) > 0) {
    # Plot RMSEs
    ggplot(rmse_results, aes(x = factor(Lags), y = RMSE_Avg, fill = Model)) +
        geom_bar(stat = "identity", width = 0.6) +
        geom_text(aes(label = round(RMSE_Avg, 3)), vjust = -0.5, fontface = "bold") +
        labs(
            title = "VAR Model Cross-Validation: Average RMSE",
            subtitle = "Lower RMSE = Better out-of-sample forecast performance",
            x = "Number of Lags (p)",
            y = "Average RMSE across ORtg, Pace, 3PAr"
        ) +
        theme_minimal() +
        theme(legend.position = "none") +
        scale_fill_brewer(palette = "Set2")
}

Code
if (exists("rmse_results") && nrow(rmse_results) > 0) {
    best_var_idx <- which.min(rmse_results$RMSE_Avg)
    cat("\nBest model:", rmse_results$Model[best_var_idx], "(Average RMSE =", round(rmse_results$RMSE_Avg[best_var_idx], 3), ")\n")
} else {
    best_var_idx <- 1
}

Best model: VAR(1) (Average RMSE = 0.746 )
Code
# Select best model
if (exists("rmse_results") && nrow(rmse_results) > 0 && exists("best_var_idx")) {
    best_var_name <- rmse_results$Model[best_var_idx]
    best_var_lags <- rmse_results$Lags[best_var_idx]
} else {
    best_var_lags <- min(lags_to_fit)
}

cat("Final VAR Model: VAR(", best_var_lags, ")\n\n", sep = "")
Final VAR Model: VAR(1)
Code
# Refit on full data
final_var <- tryCatch(
    {
        VAR(var_data_final, p = best_var_lags, type = "const")
    },
    error = function(e) {
        VAR(var_data_final, p = 1, type = "const")
    }
)

print(summary(final_var))

VAR Estimation Results:
========================= 
Endogenous variables: ORtg, Pace, X3PAr 
Deterministic variables: const 
Sample size: 43 
Log Likelihood: -24.745 
Roots of the characteristic polynomial:
0.3236 0.1471 0.1355
Call:
VAR(y = var_data_final, p = best_var_lags, type = "const")


Estimation results for equation ORtg: 
===================================== 
ORtg = ORtg.l1 + Pace.l1 + X3PAr.l1 + const 

         Estimate Std. Error t value Pr(>|t|)  
ORtg.l1  -0.38350    0.15583  -2.461   0.0184 *
Pace.l1   0.17639    0.15459   1.141   0.2608  
X3PAr.l1 29.14282   14.02867   2.077   0.0444 *
const     0.02675    0.23554   0.114   0.9102  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1.349 on 39 degrees of freedom
Multiple R-Squared: 0.1733, Adjusted R-squared: 0.1097 
F-statistic: 2.724 on 3 and 39 DF,  p-value: 0.05723 


Estimation results for equation Pace: 
===================================== 
Pace = ORtg.l1 + Pace.l1 + X3PAr.l1 + const 

         Estimate Std. Error t value Pr(>|t|)
ORtg.l1  -0.03026    0.16414  -0.184    0.855
Pace.l1  -0.11711    0.16283  -0.719    0.476
X3PAr.l1  6.57227   14.77673   0.445    0.659
const    -0.10617    0.24810  -0.428    0.671


Residual standard error: 1.421 on 39 degrees of freedom
Multiple R-Squared: 0.02201,    Adjusted R-squared: -0.05322 
F-statistic: 0.2925 on 3 and 39 DF,  p-value: 0.8305 


Estimation results for equation X3PAr: 
====================================== 
X3PAr = ORtg.l1 + Pace.l1 + X3PAr.l1 + const 

           Estimate Std. Error t value Pr(>|t|)   
ORtg.l1  -0.0008257  0.0018756  -0.440   0.6622   
Pace.l1   0.0011681  0.0018606   0.628   0.5338   
X3PAr.l1  0.1654261  0.1688517   0.980   0.3333   
const     0.0080410  0.0028350   2.836   0.0072 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.01623 on 39 degrees of freedom
Multiple R-Squared: 0.02972,    Adjusted R-squared: -0.04492 
F-statistic: 0.3982 on 3 and 39 DF,  p-value: 0.7551 



Covariance matrix of residuals:
          ORtg      Pace      X3PAr
ORtg  1.818853  0.407221  0.0052772
Pace  0.407221  2.018000 -0.0020829
X3PAr 0.005277 -0.002083  0.0002635

Correlation matrix of residuals:
        ORtg     Pace    X3PAr
ORtg  1.0000  0.21255  0.24106
Pace  0.2126  1.00000 -0.09033
X3PAr 0.2411 -0.09033  1.00000
Code
# Forecast 5 periods ahead
fc_var_final <- predict(final_var, n.ahead = 5)

# Get variable names
fc_var_names <- names(fc_var_final$fcst)
tpar_fc_name <- fc_var_names[grep("3PAr|PAr", fc_var_names, ignore.case = TRUE)]
if (length(tpar_fc_name) == 0) tpar_fc_name <- fc_var_names[3]

# Display forecasts
cat("\n=== 5-Period Forecasts ===\n\n")

=== 5-Period Forecasts ===
Code
cat("ORtg:\n")
ORtg:
Code
print(fc_var_final$fcst$ORtg)
            fcst     lower    upper       CI
[1,]  1.14541147 -1.497891 3.788714 2.643302
[2,] -0.01197986 -2.904812 2.880852 2.892832
[3,]  0.29428871 -2.615367 3.203944 2.909656
[4,]  0.18514537 -2.726620 3.096911 2.911765
[5,]  0.21921670 -2.692756 3.131190 2.911973
Code
cat("\n", tpar_fc_name, ":\n", sep = "")

X3PAr:
Code
print(fc_var_final$fcst[[tpar_fc_name]])
            fcst       lower      upper         CI
[1,] 0.013430804 -0.01838448 0.04524608 0.03181528
[2,] 0.009377456 -0.02292743 0.04168234 0.03230489
[3,] 0.009533668 -0.02277694 0.04184428 0.03231061
[4,] 0.009331515 -0.02297983 0.04164286 0.03231135
[5,] 0.009375650 -0.02293573 0.04168703 0.03231138
Code
cat("\nPace:\n")

Pace:
Code
print(fc_var_final$fcst$Pace)
            fcst     lower    upper       CI
[1,]  0.05172503 -2.732528 2.835978 2.784253
[2,] -0.05861195 -2.873542 2.756318 2.814930
[3,] -0.03730962 -2.852842 2.778223 2.815533
[4,] -0.04804480 -2.863606 2.767516 2.815561
[5,] -0.04481374 -2.860377 2.770749 2.815563
Code
# Plot forecasts
for (vname in fc_var_names) {
    plot(fc_var_final, names = vname)
}

Granger Causality Tests:

Code
# Granger causality tests
var_names <- names(final_var$varresult)
tpar_name <- var_names[grep("3PAr|PAr", var_names, ignore.case = TRUE)]
if (length(tpar_name) == 0) tpar_name <- var_names[3]

granger_3par_ortg <- causality(final_var, cause = tpar_name)$Granger
granger_pace <- causality(final_var, cause = "Pace")$Granger
granger_ortg <- causality(final_var, cause = "ORtg")$Granger

cat("3PAr → {ORtg, Pace}: F =", round(granger_3par_ortg$statistic, 3),
    ", p =", round(granger_3par_ortg$p.value, 4), "\n")
3PAr → {ORtg, Pace}: F = 2.158 , p = 0.1202 
Code
cat("Pace → {ORtg, 3PAr}: F =", round(granger_pace$statistic, 3),
    ", p =", round(granger_pace$p.value, 4), "\n")
Pace → {ORtg, 3PAr}: F = 0.717 , p = 0.4903 
Code
cat("ORtg → {Pace, 3PAr}: F =", round(granger_ortg$statistic, 3),
    ", p =", round(granger_ortg$p.value, 4), "\n")
ORtg → {Pace, 3PAr}: F = 0.122 , p = 0.8851 

Granger causality tests reveal the temporal ordering of the analytics revolution by determining which variables’ past values predict other variables’ future changes. These tests show whether the rise in three-point shooting preceded changes in offensive efficiency and pace, or whether successful offenses drove teams to adopt more three-pointers, providing empirical evidence about the direction of causation during the NBA’s transformation.

Code
var_names_irf <- names(final_var$varresult)
tpar_name_irf <- var_names_irf[grep("3PAr|PAr", var_names_irf, ignore.case = TRUE)]
if (length(tpar_name_irf) == 0) tpar_name_irf <- var_names_irf[3]

# IRFs
irf_3par_ortg <- irf(final_var, impulse = tpar_name_irf, response = "ORtg", n.ahead = 10)
plot(irf_3par_ortg, main = paste("Impulse:", tpar_name_irf, "→ Response: ORtg"))

Code
irf_pace_ortg <- irf(final_var, impulse = "Pace", response = "ORtg", n.ahead = 10)
plot(irf_pace_ortg, main = "Impulse: Pace → Response: ORtg")

Code
irf_ortg_3par <- irf(final_var, impulse = "ORtg", response = tpar_name_irf, n.ahead = 10)
plot(irf_ortg_3par, main = paste("Impulse: ORtg → Response:", tpar_name_irf))

The VAR model reveals meaningful relationships among offensive efficiency, three-point shooting, and pace across NBA history. Granger causality tests quantify whether past values of one variable help predict future values of others, addressing the question of whether the rise in three-point shooting preceded changes in offensive efficiency or vice versa.

Impulse response functions trace how a one-time shock to one variable cascades through the system, showing whether innovations in one aspect of basketball strategy have lasting effects on others or if defensive adjustments eventually neutralize advantages. Together, these tools demonstrate that while the analytics revolution transformed the NBA, reflecting the complex adaptive nature of elite competition where strategic innovations provoke counter-responses.

Pace Dynamics (VAR)

Variables: Pace, 3PAr, eFG%

This VAR model investigates the bidirectional relationships between pace of play, three-point attempt rate, and effective field goal percentage. The theoretical motivation is multifaceted: faster pace may facilitate more three-point attempts through transition opportunities, while teams that shoot more threes may adopt faster tempos to maximize possessions. Additionally, better shooting efficiency (eFG%) may enable teams to control tempo more effectively, while faster pace could create higher-quality shot opportunities through defensive breakdowns. Unlike ARIMAX models that assume one-directional causality, VAR allows all three variables to influence each other dynamically.

Code
# Create VAR dataset for pace dynamics
var_pace_data <- ts(league_avg %>% dplyr::select(Pace, `3PAr`, `eFG%`),
    start = 1980, frequency = 1
)

# Plot all series
autoplot(var_pace_data, facets = TRUE) +
    labs(
        title = "VAR Variables: Pace, 3PAr, eFG% (1980-2025)",
        x = "Year", y = "Value"
    ) +
    theme_minimal()

Code
cat("Summary Statistics:\n")
Summary Statistics:
Code
summary(var_pace_data)
      Pace             3PAr              eFG%       
 Min.   : 88.92   Min.   :0.02292   Min.   :0.4659  
 1st Qu.: 91.81   1st Qu.:0.08761   1st Qu.:0.4877  
 Median : 95.77   Median :0.19584   Median :0.4945  
 Mean   : 95.65   Mean   :0.19578   Mean   :0.4981  
 3rd Qu.: 99.18   3rd Qu.:0.25958   3rd Qu.:0.5009  
 Max.   :103.06   Max.   :0.42119   Max.   :0.5465  
Code
# ADF tests for each series
adf_pace_var2 <- adf.test(var_pace_data[, "Pace"])
adf_3par_var2 <- adf.test(var_pace_data[, "3PAr"])
adf_efg_var2 <- adf.test(var_pace_data[, "eFG%"])

cat(
    "Pace: ADF p-value =", round(adf_pace_var2$p.value, 4),
    ifelse(adf_pace_var2$p.value < 0.05, "(stationary)", "(non-stationary)"), "\n"
)
Pace: ADF p-value = 0.8116 (non-stationary) 
Code
cat(
    "3PAr: ADF p-value =", round(adf_3par_var2$p.value, 4),
    ifelse(adf_3par_var2$p.value < 0.05, "(stationary)", "(non-stationary)"), "\n"
)
3PAr: ADF p-value = 0.8303 (non-stationary) 
Code
cat(
    "eFG%: ADF p-value =", round(adf_efg_var2$p.value, 4),
    ifelse(adf_efg_var2$p.value < 0.05, "(stationary)", "(non-stationary)"), "\n\n"
)
eFG%: ADF p-value = 0.9072 (non-stationary) 
Code
# Difference if non-stationary
if (adf_pace_var2$p.value > 0.05 | adf_3par_var2$p.value > 0.05 | adf_efg_var2$p.value > 0.05) {
    var_pace_data_diff <- diff(var_pace_data)

    # Test differenced data
    adf_pace_diff <- adf.test(var_pace_data_diff[, "Pace"])
    adf_3par_diff <- adf.test(var_pace_data_diff[, "3PAr"])
    adf_efg_diff <- adf.test(var_pace_data_diff[, "eFG%"])

    cat("After first-differencing:\n")
    cat("Pace: ADF p-value =", round(adf_pace_diff$p.value, 4), "\n")
    cat("3PAr: ADF p-value =", round(adf_3par_diff$p.value, 4), "\n")
    cat("eFG%: ADF p-value =", round(adf_efg_diff$p.value, 4), "\n")

    if (adf_pace_diff$p.value < 0.05 & adf_3par_diff$p.value < 0.05 & adf_efg_diff$p.value < 0.05) {
        var_pace_data_final <- var_pace_data_diff
        differenced_pace <- TRUE
    } else {
        var_pace_data_final <- var_pace_data_diff
        differenced_pace <- TRUE
    }
} else {
    var_pace_data_final <- var_pace_data
    differenced_pace <- FALSE
}
After first-differencing:
Pace: ADF p-value = 0.187 
3PAr: ADF p-value = 0.0446 
eFG%: ADF p-value = 0.0333 

The time series reveal distinct evolutionary patterns. Pace shows the famous U-shape: declining from the fast-paced 1980s through the defensive mid-2000s, then rebounding in the modern era as analytics embraced tempo. Three-point attempt rate explodes post-2012, capturing the shot selection revolution. Effective field goal percentage rises steadily, reflecting improved shooting skill and better shot selection. All three series exhibit non-stationary trends, necessitating differencing for valid VAR estimation.

Code
# Determine optimal lag order
var_pace_select <- VARselect(var_pace_data_final, lag.max = 8, type = "const")

print(var_pace_select$selection)
AIC(n)  HQ(n)  SC(n) FPE(n) 
     1      1      1      1 
Code
print(var_pace_select$criteria)
                   1             2             3             4             5
AIC(n) -1.742587e+01 -1.725714e+01 -1.709067e+01 -1.689179e+01 -1.661036e+01
HQ(n)  -1.724164e+01 -1.693473e+01 -1.663009e+01 -1.629304e+01 -1.587344e+01
SC(n)  -1.689803e+01 -1.633342e+01 -1.577107e+01 -1.517631e+01 -1.449900e+01
FPE(n)  2.711707e-08  3.249790e-08  3.955006e-08  5.110035e-08  7.464754e-08
                   6             7             8
AIC(n) -1.659041e+01 -1.642678e+01 -1.653350e+01
HQ(n)  -1.571532e+01 -1.541351e+01 -1.538207e+01
SC(n)  -1.408317e+01 -1.352366e+01 -1.323451e+01
FPE(n)  8.899211e-08  1.334835e-07  1.745185e-07
Code
# Fit models with different lag orders
lags_pace_to_fit <- unique(var_pace_select$selection[1:3])

# Ensure we have at least one lag to fit
if (length(lags_pace_to_fit) == 0 || any(is.na(lags_pace_to_fit))) {
    lags_pace_to_fit <- 1
}

cat("\nFitting VAR models with p =", paste(lags_pace_to_fit, collapse = ", "), "\n")

Fitting VAR models with p = 1 
Code
var_pace_models <- list()
for (p in lags_pace_to_fit) {
    var_pace_models[[paste0("VAR_", p)]] <- VAR(var_pace_data_final, p = p, type = "const")
}
Code
for (name in names(var_pace_models)) {
    cat("========================================\n")
    cat(name, "Summary:\n")
    cat("========================================\n\n")
    print(summary(var_pace_models[[name]]))
    cat("\n\n")
}
========================================
VAR_1 Summary:
========================================


VAR Estimation Results:
========================= 
Endogenous variables: Pace, X3PAr, eFG. 
Deterministic variables: const 
Sample size: 43 
Log Likelihood: 208.868 
Roots of the characteristic polynomial:
0.3072 0.1708 0.1454
Call:
VAR(y = var_pace_data_final, p = p, type = "const")


Estimation results for equation Pace: 
===================================== 
Pace = Pace.l1 + X3PAr.l1 + eFG..l1 + const 

         Estimate Std. Error t value Pr(>|t|)
Pace.l1   -0.1350     0.1729  -0.781    0.440
X3PAr.l1   4.3987    16.3643   0.269    0.790
eFG..l1    6.0838    38.5944   0.158    0.876
const     -0.1038     0.2486  -0.418    0.679


Residual standard error: 1.421 on 39 degrees of freedom
Multiple R-Squared: 0.02178,    Adjusted R-squared: -0.05347 
F-statistic: 0.2894 on 3 and 39 DF,  p-value: 0.8327 


Estimation results for equation X3PAr: 
====================================== 
X3PAr = Pace.l1 + X3PAr.l1 + eFG..l1 + const 

          Estimate Std. Error t value Pr(>|t|)   
Pace.l1  0.0009159  0.0019800   0.463  0.64624   
X3PAr.l1 0.1347711  0.1874200   0.719  0.47637   
eFG..l1  0.0343928  0.4420217   0.078  0.93838   
const    0.0080525  0.0028473   2.828  0.00735 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.01627 on 39 degrees of freedom
Multiple R-Squared: 0.02505,    Adjusted R-squared: -0.04995 
F-statistic: 0.334 on 3 and 39 DF,  p-value: 0.8008 


Estimation results for equation eFG.: 
===================================== 
eFG. = Pace.l1 + X3PAr.l1 + eFG..l1 + const 

           Estimate Std. Error t value Pr(>|t|)  
Pace.l1   7.028e-04  8.245e-04   0.852   0.3992  
X3PAr.l1  1.801e-01  7.805e-02   2.308   0.0264 *
eFG..l1  -2.816e-01  1.841e-01  -1.530   0.1341  
const     4.086e-06  1.186e-03   0.003   0.9973  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.006776 on 39 degrees of freedom
Multiple R-Squared: 0.1236, Adjusted R-squared: 0.05617 
F-statistic: 1.833 on 3 and 39 DF,  p-value: 0.1571 



Covariance matrix of residuals:
           Pace      X3PAr      eFG.
Pace   2.018472 -2.042e-03 3.238e-03
X3PAr -0.002042  2.648e-04 4.819e-05
eFG.   0.003238  4.819e-05 4.591e-05

Correlation matrix of residuals:
          Pace    X3PAr   eFG.
Pace   1.00000 -0.08834 0.3364
X3PAr -0.08834  1.00000 0.4371
eFG.   0.33640  0.43708 1.0000
Code
# Split: Train 1980-2019, Test 2020-2024
train_end_var_pace <- 2019

if (differenced_pace) {
    train_var_pace <- window(var_pace_data_final, end = train_end_var_pace)
    test_var_pace <- window(var_pace_data_final, start = train_end_var_pace + 1)
} else {
    train_var_pace <- window(var_pace_data_final, end = train_end_var_pace)
    test_var_pace <- window(var_pace_data_final, start = train_end_var_pace + 1)
}

h_var_pace <- nrow(test_var_pace)

# Fit VAR models on training data with error handling
var_pace_train_models <- list()
for (p in lags_pace_to_fit) {
    model <- tryCatch(
        {
            VAR(train_var_pace, p = p, type = "const")
        },
        error = function(e) {
            if (p > 1) {
                VAR(train_var_pace, p = 1, type = "const")
            } else {
                NULL
            }
        }
    )
    if (!is.null(model)) {
        var_pace_train_models[[paste0("VAR_", p)]] <- model
    }
}

# Generate forecasts
rmse_pace_results <- data.frame()

cat("Evaluating", length(var_pace_train_models), "VAR model(s)\n")
Evaluating 1 VAR model(s)
Code
for (name in names(var_pace_train_models)) {
    fc <- tryCatch(
        {
            predict(var_pace_train_models[[name]], n.ahead = h_var_pace)
        },
        error = function(e) {
            cat("Warning: Forecast failed for", name, ":", e$message, "\n")
            NULL
        }
    )

    if (is.null(fc)) next

    # Extract forecasts for each variable
    fc_var_names_pace <- names(fc$fcst)

    # Find Pace
    fc_pace_var <- fc$fcst$Pace[, "fcst"]

    # Find 3PAr
    tpar_fc_name_pace <- fc_var_names_pace[grep("3PAr|PAr", fc_var_names_pace, ignore.case = TRUE)]
    if (length(tpar_fc_name_pace) == 0) {
        tpar_fc_name_pace <- fc_var_names_pace[2]
    } else {
        tpar_fc_name_pace <- tpar_fc_name_pace[1]
    }
    fc_3par_pace <- fc$fcst[[tpar_fc_name_pace]][, "fcst"]

    # Find eFG%
    efg_fc_name_pace <- fc_var_names_pace[grep("eFG%|eFG", fc_var_names_pace, ignore.case = TRUE)]
    if (length(efg_fc_name_pace) == 0) {
        efg_fc_name_pace <- fc_var_names_pace[3]
    } else {
        efg_fc_name_pace <- efg_fc_name_pace[1]
    }
    fc_efg_pace <- fc$fcst[[efg_fc_name_pace]][, "fcst"]

    # Convert test data to numeric vectors
    test_var_names_pace <- colnames(test_var_pace)
    test_pace_vec <- as.numeric(test_var_pace[, "Pace"])

    tpar_test_name_pace <- test_var_names_pace[grep("3PAr|PAr", test_var_names_pace, ignore.case = TRUE)]
    if (length(tpar_test_name_pace) == 0) {
        tpar_test_name_pace <- test_var_names_pace[2]
    } else {
        tpar_test_name_pace <- tpar_test_name_pace[1]
    }
    test_3par_pace <- as.numeric(test_var_pace[, tpar_test_name_pace])

    efg_test_name_pace <- test_var_names_pace[grep("eFG%|eFG", test_var_names_pace, ignore.case = TRUE)]
    if (length(efg_test_name_pace) == 0) {
        efg_test_name_pace <- test_var_names_pace[3]
    } else {
        efg_test_name_pace <- efg_test_name_pace[1]
    }
    test_efg_pace <- as.numeric(test_var_pace[, efg_test_name_pace])

    # Ensure equal lengths
    n_compare <- min(length(test_pace_vec), length(fc_pace_var), length(fc_3par_pace), length(fc_efg_pace))

    # Calculate RMSE for each variable
    rmse_pace <- sqrt(mean((test_pace_vec[1:n_compare] - fc_pace_var[1:n_compare])^2, na.rm = TRUE))
    rmse_3par_pace <- sqrt(mean((test_3par_pace[1:n_compare] - fc_3par_pace[1:n_compare])^2, na.rm = TRUE))
    rmse_efg_pace <- sqrt(mean((test_efg_pace[1:n_compare] - fc_efg_pace[1:n_compare])^2, na.rm = TRUE))

    # Average RMSE across variables
    rmse_avg_pace <- mean(c(rmse_pace, rmse_3par_pace, rmse_efg_pace), na.rm = TRUE)

    cat("  ", name, ": Pace RMSE =", round(rmse_pace, 4),
        "| 3PAr RMSE =", round(rmse_3par_pace, 4),
        "| eFG% RMSE =", round(rmse_efg_pace, 4),
        "| Avg =", round(rmse_avg_pace, 4), "\n")

    # Create descriptive model name
    lag_num <- as.numeric(gsub("VAR_", "", name))
    model_label <- paste0("VAR(", lag_num, ")")

    rmse_pace_results <- rbind(rmse_pace_results, data.frame(
        Model = model_label,
        Lags = lag_num,
        RMSE_Pace = rmse_pace,
        RMSE_3PAr = rmse_3par_pace,
        RMSE_eFG = rmse_efg_pace,
        RMSE_Avg = rmse_avg_pace
    ))
}
   VAR_1 : Pace RMSE = 0.8628 | 3PAr RMSE = 0.0125 | eFG% RMSE = 0.0074 | Avg = 0.2942 
Code
cat("\n=== CROSS-VALIDATION RESULTS ===\n\n")

=== CROSS-VALIDATION RESULTS ===
Cross-Validation Results: Pace VAR Models (Test Set: 2020-2024)
Model Lags RMSE (Pace) RMSE (3PAr) RMSE (eFG%) Avg RMSE
VAR(1) 1 0.8628 0.0125 0.0074 0.2942
Code
if (exists("rmse_pace_results") && nrow(rmse_pace_results) > 0) {
    # Plot RMSEs
    ggplot(rmse_pace_results, aes(x = factor(Lags), y = RMSE_Avg, fill = Model)) +
        geom_bar(stat = "identity", width = 0.6) +
        geom_text(aes(label = round(RMSE_Avg, 3)), vjust = -0.5, fontface = "bold") +
        labs(
            title = "Pace VAR Model Cross-Validation: Average RMSE",
            subtitle = "Lower RMSE = Better out-of-sample forecast performance",
            x = "Number of Lags (p)",
            y = "Average RMSE across Pace, 3PAr, eFG%"
        ) +
        theme_minimal() +
        theme(legend.position = "none") +
        scale_fill_brewer(palette = "Set2")
}

Code
if (exists("rmse_pace_results") && nrow(rmse_pace_results) > 0) {
    best_var_pace_idx <- which.min(rmse_pace_results$RMSE_Avg)
    cat("\nBest model:", rmse_pace_results$Model[best_var_pace_idx], "(Average RMSE =", round(rmse_pace_results$RMSE_Avg[best_var_pace_idx], 3), ")\n")
} else {
    best_var_pace_idx <- 1
}

Best model: VAR(1) (Average RMSE = 0.294 )
Code
# Select best model
if (exists("rmse_pace_results") && nrow(rmse_pace_results) > 0 && exists("best_var_pace_idx")) {
    best_var_pace_name <- rmse_pace_results$Model[best_var_pace_idx]
    best_var_pace_lags <- rmse_pace_results$Lags[best_var_pace_idx]
} else {
    best_var_pace_lags <- min(lags_pace_to_fit)
}

cat("Final VAR Model: VAR(", best_var_pace_lags, ")\n\n", sep = "")
Final VAR Model: VAR(1)
Code
# Refit on full data
final_var_pace <- tryCatch(
    {
        VAR(var_pace_data_final, p = best_var_pace_lags, type = "const")
    },
    error = function(e) {
        VAR(var_pace_data_final, p = 1, type = "const")
    }
)

print(summary(final_var_pace))

VAR Estimation Results:
========================= 
Endogenous variables: Pace, X3PAr, eFG. 
Deterministic variables: const 
Sample size: 43 
Log Likelihood: 208.868 
Roots of the characteristic polynomial:
0.3072 0.1708 0.1454
Call:
VAR(y = var_pace_data_final, p = best_var_pace_lags, type = "const")


Estimation results for equation Pace: 
===================================== 
Pace = Pace.l1 + X3PAr.l1 + eFG..l1 + const 

         Estimate Std. Error t value Pr(>|t|)
Pace.l1   -0.1350     0.1729  -0.781    0.440
X3PAr.l1   4.3987    16.3643   0.269    0.790
eFG..l1    6.0838    38.5944   0.158    0.876
const     -0.1038     0.2486  -0.418    0.679


Residual standard error: 1.421 on 39 degrees of freedom
Multiple R-Squared: 0.02178,    Adjusted R-squared: -0.05347 
F-statistic: 0.2894 on 3 and 39 DF,  p-value: 0.8327 


Estimation results for equation X3PAr: 
====================================== 
X3PAr = Pace.l1 + X3PAr.l1 + eFG..l1 + const 

          Estimate Std. Error t value Pr(>|t|)   
Pace.l1  0.0009159  0.0019800   0.463  0.64624   
X3PAr.l1 0.1347711  0.1874200   0.719  0.47637   
eFG..l1  0.0343928  0.4420217   0.078  0.93838   
const    0.0080525  0.0028473   2.828  0.00735 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.01627 on 39 degrees of freedom
Multiple R-Squared: 0.02505,    Adjusted R-squared: -0.04995 
F-statistic: 0.334 on 3 and 39 DF,  p-value: 0.8008 


Estimation results for equation eFG.: 
===================================== 
eFG. = Pace.l1 + X3PAr.l1 + eFG..l1 + const 

           Estimate Std. Error t value Pr(>|t|)  
Pace.l1   7.028e-04  8.245e-04   0.852   0.3992  
X3PAr.l1  1.801e-01  7.805e-02   2.308   0.0264 *
eFG..l1  -2.816e-01  1.841e-01  -1.530   0.1341  
const     4.086e-06  1.186e-03   0.003   0.9973  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.006776 on 39 degrees of freedom
Multiple R-Squared: 0.1236, Adjusted R-squared: 0.05617 
F-statistic: 1.833 on 3 and 39 DF,  p-value: 0.1571 



Covariance matrix of residuals:
           Pace      X3PAr      eFG.
Pace   2.018472 -2.042e-03 3.238e-03
X3PAr -0.002042  2.648e-04 4.819e-05
eFG.   0.003238  4.819e-05 4.591e-05

Correlation matrix of residuals:
          Pace    X3PAr   eFG.
Pace   1.00000 -0.08834 0.3364
X3PAr -0.08834  1.00000 0.4371
eFG.   0.33640  0.43708 1.0000
Code
# Forecast 5 periods ahead
fc_var_pace_final <- predict(final_var_pace, n.ahead = 5)

# Get variable names
fc_var_pace_names <- names(fc_var_pace_final$fcst)
tpar_fc_name_final <- fc_var_pace_names[grep("3PAr|PAr", fc_var_pace_names, ignore.case = TRUE)]
if (length(tpar_fc_name_final) == 0) tpar_fc_name_final <- fc_var_pace_names[2]

efg_fc_name_final <- fc_var_pace_names[grep("eFG%|eFG", fc_var_pace_names, ignore.case = TRUE)]
if (length(efg_fc_name_final) == 0) efg_fc_name_final <- fc_var_pace_names[3]

# Display forecasts
cat("\n=== 5-Period Forecasts ===\n\n")

=== 5-Period Forecasts ===
Code
cat("Pace:\n")
Pace:
Code
print(fc_var_pace_final$fcst$Pace)
             fcst     lower    upper       CI
[1,] -0.053381013 -2.837959 2.731197 2.784578
[2,] -0.008244123 -2.822531 2.806043 2.814287
[3,] -0.057124869 -2.872244 2.757994 2.815119
[4,] -0.044817235 -2.859962 2.770328 2.815145
[5,] -0.049536647 -2.864684 2.765611 2.815148
Code
cat("\n", tpar_fc_name_final, ":\n", sep = "")

X3PAr:
Code
print(fc_var_pace_final$fcst[[tpar_fc_name_final]])
            fcst       lower      upper         CI
[1,] 0.011806217 -0.02008553 0.04369797 0.03189175
[2,] 0.009800696 -0.02249224 0.04209363 0.03229294
[3,] 0.009379742 -0.02292588 0.04168536 0.03230562
[4,] 0.009320975 -0.02298498 0.04162693 0.03230595
[5,] 0.009308504 -0.02299746 0.04161446 0.03230596
Code
cat("\n", efg_fc_name_final, ":\n", sep = "")

eFG.:
Code
print(fc_var_pace_final$fcst[[efg_fc_name_final]])
             fcst       lower      upper         CI
[1,] 0.0059894985 -0.00729126 0.01927026 0.01328076
[2,] 0.0004066626 -0.01378868 0.01460200 0.01419534
[3,] 0.0016492776 -0.01258105 0.01587960 0.01423032
[4,] 0.0011891645 -0.01304598 0.01542431 0.01423515
[5,] 0.0013167992 -0.01291869 0.01555229 0.01423549
Code
# Plot forecasts
for (vname in fc_var_pace_names) {
    plot(fc_var_pace_final, names = vname)
}

Granger Causality Tests:

Code
# Granger causality tests
var_pace_names <- names(final_var_pace$varresult)
tpar_name_granger <- var_pace_names[grep("3PAr|PAr", var_pace_names, ignore.case = TRUE)]
if (length(tpar_name_granger) == 0) tpar_name_granger <- var_pace_names[2]

efg_name_granger <- var_pace_names[grep("eFG%|eFG", var_pace_names, ignore.case = TRUE)]
if (length(efg_name_granger) == 0) efg_name_granger <- var_pace_names[3]

granger_pace_all <- causality(final_var_pace, cause = "Pace")$Granger
granger_3par_pace <- causality(final_var_pace, cause = tpar_name_granger)$Granger
granger_efg_pace <- causality(final_var_pace, cause = efg_name_granger)$Granger

cat("Pace → {3PAr, eFG%}: F =", round(granger_pace_all$statistic, 3),
    ", p =", round(granger_pace_all$p.value, 4), "\n")
Pace → {3PAr, eFG%}: F = 0.368 , p = 0.6928 
Code
cat("3PAr → {Pace, eFG%}: F =", round(granger_3par_pace$statistic, 3),
    ", p =", round(granger_3par_pace$p.value, 4), "\n")
3PAr → {Pace, eFG%}: F = 2.809 , p = 0.0643 
Code
cat("eFG% → {Pace, 3PAr}: F =", round(granger_efg_pace$statistic, 3),
    ", p =", round(granger_efg_pace$p.value, 4), "\n")
eFG% → {Pace, 3PAr}: F = 0.017 , p = 0.9835 

Granger causality tests reveal which strategic variables lead versus follow in NBA evolution. If 3PAr Granger-causes Pace, this suggests teams first adopted three-point shooting, then adjusted their tempo accordingly. Conversely, if Pace Granger-causes 3PAr, faster play may have created the transition opportunities that made three-point volume feasible. The eFG% tests reveal whether shooting skill improvements preceded or followed strategic changes.

Code
# IRFs
irf_3par_pace <- irf(final_var_pace, impulse = tpar_name_granger, response = "Pace", n.ahead = 10)
plot(irf_3par_pace, main = paste("Impulse:", tpar_name_granger, "→ Response: Pace"))

Code
irf_efg_pace <- irf(final_var_pace, impulse = efg_name_granger, response = "Pace", n.ahead = 10)
plot(irf_efg_pace, main = paste("Impulse:", efg_name_granger, "→ Response: Pace"))

Code
irf_pace_3par <- irf(final_var_pace, impulse = "Pace", response = tpar_name_granger, n.ahead = 10)
plot(irf_pace_3par, main = paste("Impulse: Pace → Response:", tpar_name_granger))

The Pace Dynamics VAR model uncovers the temporal relationships between game tempo, shot selection, and shooting efficiency. This addresses a fundamental question in basketball analytics: did the analytics revolution’s emphasis on three-point shooting cause teams to speed up, or did faster pace create opportunities for more three-point attempts?

Impulse response functions trace the dynamic effects of shocks. If a one-time increase in Pace produces a lasting increase in 3PAr, this supports the hypothesis that faster tempo facilitates three-point volume through transition opportunities and defensive breakdowns. Conversely, if innovations in 3PAr lead to sustained changes in Pace, this suggests strategic shot selection drives tempo decisions. The eFG% impulses reveal whether shooting skill improvements enable teams to control pace or result from pace-driven shot quality.

Cross-validation results demonstrate the model’s out-of-sample forecasting accuracy, with lower RMSE indicating that the bidirectional relationships captured by VAR improve predictions beyond univariate models. The optimal lag structure reveals how long past values influence future changes. A longer optimal lag suggests persistent momentum effects, while shorter lags indicate rapid strategic adjustments where teams quickly respond to innovations.

This VAR framework captures the complex adaptive dynamics of NBA strategy, where teams continuously adjust multiple dimensions simultaneously rather than optimizing single variables in isolation. The analytics revolution wasn’t just about shooting more threes; it was about reconfiguring pace, shot selection, and skill development in an interconnected system where each element reinforces the others.


Sports Betting & NBA Recovery (VAR)

Before running Model 5, we need to load and process DKNG stock data:

Code
# Load DKNG daily stock data
dkng_data <- read_csv("data/financial/DKNG_daily.csv", show_col_types = FALSE)

# Convert to annual averages (matching NBA season structure)
dkng_annual <- dkng_data %>%
    mutate(
        Date = as.Date(Date),
        Year = year(Date),
        # NBA seasons span two calendar years; use the ending year
        # e.g., 2019-2020 season → 2020
        Season = ifelse(month(Date) >= 10, Year + 1, Year)
    ) %>%
    group_by(Season) %>%
    summarise(
        DKNG_Price = mean(`Adj Close`, na.rm = TRUE),
        .groups = "drop"
    )

# Merge with league_avg data
league_avg_dkng <- league_avg %>%
    left_join(dkng_annual, by = "Season") %>%
    filter(!is.na(DKNG_Price)) # Only keep seasons with DKNG data (2020+)

cat("DKNG data loaded and merged. Available seasons:", min(league_avg_dkng$Season), "-", max(league_avg_dkng$Season), "\n")
DKNG data loaded and merged. Available seasons: 2020 - 2025 
Code
cat("Number of observations:", nrow(league_avg_dkng), "\n\n")
Number of observations: 6 
Code
head(league_avg_dkng %>% dplyr::select(Season, DKNG_Price, Total_Attendance, ORtg))
# A tibble: 6 × 4
  Season DKNG_Price Total_Attendance  ORtg
   <dbl>      <dbl>            <dbl> <dbl>
1   2020       35.8         17850427  111.
2   2021       53.3          1533769  112.
3   2022       22.7         21559732  112.
4   2023       20.8         22936759  115.
5   2024       37.9         23289681  115.
6   2025       39.7         23065137  115.

Sports Betting & NBA Recovery (VAR)

Variables: DKNG (DraftKings stock price), Attendance, ORtg

This VAR model tests a provocative hypothesis: does NBA performance influence sports betting industry valuations? The 2020-2025 period offers a natural experiment: COVID decimated attendance and disrupted seasons, while sports betting stocks experienced extreme volatility. By modeling bidirectional relationships between DraftKings’ stock price, NBA attendance, and offensive efficiency, we test whether the on-court analytics revolution created measurable financial value in connected industries. If ORtg Granger-causes DKNG, this suggests exciting offensive basketball drives betting engagement and stock valuations. If Attendance Granger-causes DKNG, this indicates fan engagement metrics predict betting industry performance.

Code
# Create VAR dataset (2020-2025 only, when DKNG data exists)
var_dkng_data <- ts(league_avg_dkng %>% dplyr::select(DKNG_Price, Total_Attendance, ORtg),
    start = min(league_avg_dkng$Season), frequency = 1
)

# Plot all series
autoplot(var_dkng_data, facets = TRUE) +
    labs(
        title = "VAR Variables: DKNG Price, Attendance, ORtg (2020-2025)",
        x = "Year", y = "Value"
    ) +
    theme_minimal()

Code
cat("Summary Statistics:\n")
Summary Statistics:
Code
summary(var_dkng_data)
   DKNG_Price    Total_Attendance        ORtg      
 Min.   :20.79   Min.   : 1533769   Min.   :110.5  
 1st Qu.:25.95   1st Qu.:18777753   1st Qu.:112.1  
 Median :36.86   Median :22248246   Median :113.4  
 Mean   :35.02   Mean   :18372584   Mean   :113.2  
 3rd Qu.:39.24   3rd Qu.:23033042   3rd Qu.:114.7  
 Max.   :53.28   Max.   :23289681   Max.   :115.3  

The visualization captures post-COVID dynamics: DKNG stock exhibits extreme volatility as the sports betting industry navigated legalization waves and market uncertainty. Attendance shows the pandemic collapse in 2020-2021 and gradual recovery. ORtg continues its steady climb, demonstrating that on-court analytics proceeded independent of external shocks. The limited sample size (only 5-6 observations) presents statistical challenges but reflects the natural constraint that DKNG only became public in 2020.

Code
# ADF tests for each series
# For very short series, ADF tests may be unreliable
# We'll run them but interpret cautiously
# adf_dkng_var <- tryCatch(adf.test(var_dkng_data[, "DKNG_Price"]), error = function(e) NULL)
# adf_attend_var <- tryCatch(adf.test(var_dkng_data[, "Total_Attendance"]), error = function(e) NULL)
# adf_ortg_var_dkng <- tryCatch(adf.test(var_dkng_data[, "ORtg"]), error = function(e) NULL)

# if (!is.null(adf_dkng_var)) {
#     cat(
#         "DKNG Price: ADF p-value =", round(adf_dkng_var$p.value, 4),
#         ifelse(adf_dkng_var$p.value < 0.05, "(stationary)", "(non-stationary)"), "\n"
#     )
# } else {
#     cat("DKNG Price: ADF test failed (insufficient data)\n")
# }

# if (!is.null(adf_attend_var)) {
#     cat(
#         "Attendance: ADF p-value =", round(adf_attend_var$p.value, 4),
#         ifelse(adf_attend_var$p.value < 0.05, "(stationary)", "(non-stationary)"), "\n"
#     )
# } else {
#     cat("Attendance: ADF test failed (insufficient data)\n")
# }

# if (!is.null(adf_ortg_var_dkng)) {
#     cat(
#         "ORtg: ADF p-value =", round(adf_ortg_var_dkng$p.value, 4),
#         ifelse(adf_ortg_var_dkng$p.value < 0.05, "(stationary)", "(non-stationary)"), "\n"
#     )
# } else {
#     cat("ORtg: ADF test failed (insufficient data)\n")
# }

var_dkng_data_final <- var_dkng_data
differenced_dkng <- FALSE
Code
# Determine optimal lag order
# With very short series, limit lag.max
var_dkng_select <- tryCatch(
    {
        VARselect(var_dkng_data_final, lag.max = 2, type = "const")
    },
    error = function(e) {
        list(selection = c(AIC = 1, HQ = 1, SC = 1, FPE = 1))
    }
)

print(var_dkng_select$selection)
AIC(n)  HQ(n)  SC(n) FPE(n) 
     1      1      1      2 
Code
if ("criteria" %in% names(var_dkng_select)) {
    print(var_dkng_select$criteria)
}
          1    2
AIC(n) -Inf -Inf
HQ(n)  -Inf -Inf
SC(n)  -Inf -Inf
FPE(n)  NaN    0
Code
# Fit models with different lag orders
lags_dkng_to_fit <- unique(var_dkng_select$selection[1:3])

# Ensure we have at least one lag to fit
if (length(lags_dkng_to_fit) == 0 || any(is.na(lags_dkng_to_fit))) {
    lags_dkng_to_fit <- 1
}

# Given short series, limit to max lag 1
lags_dkng_to_fit <- lags_dkng_to_fit[lags_dkng_to_fit <= 1]
if (length(lags_dkng_to_fit) == 0) lags_dkng_to_fit <- 1

cat("\nFitting VAR models with p =", paste(lags_dkng_to_fit, collapse = ", "), "\n")

Fitting VAR models with p = 1 
Code
var_dkng_models <- list()
for (p in lags_dkng_to_fit) {
    model_fit <- tryCatch(
        {
            VAR(var_dkng_data_final, p = p, type = "const")
        },
        error = function(e) {
            NULL
        }
    )
    if (!is.null(model_fit)) {
        var_dkng_models[[paste0("VAR_", p)]] <- model_fit
    }
}
Code
for (name in names(var_dkng_models)) {
    cat("========================================\n")
    cat(name, "Summary:\n")
    cat("========================================\n\n")

    # Try to print summary, but catch errors due to near-singular covariance matrices
    summary_result <- tryCatch(
        {
            summary(var_dkng_models[[name]])
        },
        error = function(e) {
            print(coef(var_dkng_models[[name]]))
            NULL
        }
    )

    if (!is.null(summary_result)) {
        print(summary_result)
    }

    cat("\n")
}
========================================
VAR_1 Summary:
========================================
$DKNG_Price
                         Estimate   Std. Error    t value  Pr(>|t|)
DKNG_Price.l1        9.498217e-01 1.425621e+00  0.6662513 0.6258490
Total_Attendance.l1  1.928166e-06 2.182555e-06  0.8834444 0.5393466
ORtg.l1             -2.628827e+00 5.556130e+00 -0.4731399 0.7186589
const                2.658871e+02 6.044889e+02  0.4398545 0.7361944

$Total_Attendance
                         Estimate   Std. Error    t value  Pr(>|t|)
DKNG_Price.l1       -5.214812e+05 7.099819e+05 -0.7344993 0.5966971
Total_Attendance.l1 -9.539561e-01 1.086947e+00 -0.8776471 0.5414254
ORtg.l1              4.410249e+06 2.767041e+06  1.5938501 0.3567182
const               -4.454000e+08 3.010451e+08 -1.4795125 0.3783855

$ORtg
                         Estimate   Std. Error     t value  Pr(>|t|)
DKNG_Price.l1       -8.234494e-02 4.804422e-02 -1.71394054 0.3362384
Total_Attendance.l1  2.112814e-09 7.355332e-08  0.02872492 0.9817182
ORtg.l1              3.948660e-01 1.872447e-01  2.10882339 0.2818913
const                7.194670e+01 2.037162e+01  3.53171326 0.1756600
Code
# Calculate in-sample RMSE for each model
rmse_dkng_results <- data.frame()

for (name in names(var_dkng_models)) {
    model <- var_dkng_models[[name]]

    # Get fitted values
    fitted_vals <- fitted(model)

    # Calculate RMSE for each variable
    rmse_dkng_price <- sqrt(mean((var_dkng_data_final[, "DKNG_Price"] - fitted_vals[, "DKNG_Price"])^2, na.rm = TRUE))
    rmse_attendance <- sqrt(mean((var_dkng_data_final[, "Total_Attendance"] - fitted_vals[, "Total_Attendance"])^2, na.rm = TRUE))
    rmse_ortg_dkng <- sqrt(mean((var_dkng_data_final[, "ORtg"] - fitted_vals[, "ORtg"])^2, na.rm = TRUE))

    # Average RMSE
    rmse_avg_dkng <- mean(c(rmse_dkng_price, rmse_attendance, rmse_ortg_dkng), na.rm = TRUE)

    cat(name, ": DKNG RMSE =", round(rmse_dkng_price, 2),
        "| Attendance RMSE =", round(rmse_attendance / 1e6, 2), "M",
        "| ORtg RMSE =", round(rmse_ortg_dkng, 2), "\n")

    # Create descriptive model name
    lag_num <- as.numeric(gsub("VAR_", "", name))
    model_label <- paste0("VAR(", lag_num, ")")

    rmse_dkng_results <- rbind(rmse_dkng_results, data.frame(
        Model = model_label,
        Lags = lag_num,
        RMSE_DKNG = rmse_dkng_price,
        RMSE_Attendance = rmse_attendance,
        RMSE_ORtg = rmse_ortg_dkng,
        RMSE_Avg = rmse_avg_dkng
    ))
}
VAR_1 : DKNG RMSE = 13.92 | Attendance RMSE = 11.93 M | ORtg RMSE = 1.6 
In-Sample Fit: DKNG VAR Models (2020-2025)
Model Lags RMSE (DKNG $) RMSE (Attend. M) RMSE (ORtg) Avg RMSE
VAR(1) 1 13.9204 11.9268 1.6014 3975595
Code
if (exists("rmse_dkng_results") && nrow(rmse_dkng_results) > 0) {
    best_var_dkng_idx <- which.min(rmse_dkng_results$RMSE_Avg)
    cat("\nBest model:", rmse_dkng_results$Model[best_var_dkng_idx], "(Average RMSE =", round(rmse_dkng_results$RMSE_Avg[best_var_dkng_idx], 2), ")\n")
} else {
    best_var_dkng_idx <- 1
}

Best model: VAR(1) (Average RMSE = 3975595 )
Code
# Select best model
if (exists("rmse_dkng_results") && nrow(rmse_dkng_results) > 0 && exists("best_var_dkng_idx")) {
    best_var_dkng_name <- rmse_dkng_results$Model[best_var_dkng_idx]
    best_var_dkng_lags <- rmse_dkng_results$Lags[best_var_dkng_idx]
} else {
    best_var_dkng_lags <- 1
}

cat("Final VAR Model: VAR(", best_var_dkng_lags, ")\n\n", sep = "")
Final VAR Model: VAR(1)
Code
# Use the best fitted model
final_var_dkng <- var_dkng_models[[paste0("VAR_", best_var_dkng_lags)]]

# Try to print summary, catch errors due to near-singular covariance
summary_result <- tryCatch(
    {
        summary(final_var_dkng)
    },
    error = function(e) {
        print(coef(final_var_dkng))
        NULL
    }
)
$DKNG_Price
                         Estimate   Std. Error    t value  Pr(>|t|)
DKNG_Price.l1        9.498217e-01 1.425621e+00  0.6662513 0.6258490
Total_Attendance.l1  1.928166e-06 2.182555e-06  0.8834444 0.5393466
ORtg.l1             -2.628827e+00 5.556130e+00 -0.4731399 0.7186589
const                2.658871e+02 6.044889e+02  0.4398545 0.7361944

$Total_Attendance
                         Estimate   Std. Error    t value  Pr(>|t|)
DKNG_Price.l1       -5.214812e+05 7.099819e+05 -0.7344993 0.5966971
Total_Attendance.l1 -9.539561e-01 1.086947e+00 -0.8776471 0.5414254
ORtg.l1              4.410249e+06 2.767041e+06  1.5938501 0.3567182
const               -4.454000e+08 3.010451e+08 -1.4795125 0.3783855

$ORtg
                         Estimate   Std. Error     t value  Pr(>|t|)
DKNG_Price.l1       -8.234494e-02 4.804422e-02 -1.71394054 0.3362384
Total_Attendance.l1  2.112814e-09 7.355332e-08  0.02872492 0.9817182
ORtg.l1              3.948660e-01 1.872447e-01  2.10882339 0.2818913
const                7.194670e+01 2.037162e+01  3.53171326 0.1756600
Code
if (!is.null(summary_result)) {
    print(summary_result)
} else {
    cat("\n")
}
Code
# Forecast 3 periods ahead (conservative given data limitations)
fc_var_dkng_final <- predict(final_var_dkng, n.ahead = 3)

# Display forecasts
cat("\n3-Period Forecasts (2026-2028):\n\n")

3-Period Forecasts (2026-2028):
Code
cat("DKNG Price:\n")
DKNG Price:
Code
print(fc_var_dkng_final$fcst$DKNG_Price)
         fcst    lower    upper       CI
[1,] 46.96449 7.898581 86.03040 39.06591
[2,] 43.75018 4.565139 82.93523 39.18504
[3,] 41.77703 2.547918 81.00614 39.22911
Code
cat("\nTotal Attendance:\n")

Total Attendance:
Code
print(fc_var_dkng_final$fcst$Total_Attendance)
         fcst    lower    upper       CI
[1,] 17020013 -2435433 36475458 19455445
[2,] 16432957 -4461065 37326979 20894022
[3,] 14958552 -8647399 38564503 23605951
Code
cat("\nORtg:\n")

ORtg:
Code
print(fc_var_dkng_final$fcst$ORtg)
         fcst    lower    upper       CI
[1,] 113.9528 112.6363 115.2694 1.316543
[2,] 113.1115 109.1108 117.1122 4.000673
[3,] 113.0427 108.6723 117.4131 4.370399
Code
# Plot forecasts
for (vname in names(fc_var_dkng_final$fcst)) {
    plot(fc_var_dkng_final, names = vname)
}

Granger Causality Tests:

Code
# Check if model exists
if (exists("final_var_dkng") && !is.null(final_var_dkng)) {
    # Granger causality tests with error suppression
    granger_dkng <- tryCatch(
        causality(final_var_dkng, cause = "DKNG_Price")$Granger,
        error = function(e) NULL
    )

    granger_attend_dkng <- tryCatch(
        causality(final_var_dkng, cause = "Total_Attendance")$Granger,
        error = function(e) NULL
    )

    granger_ortg_dkng <- tryCatch(
        causality(final_var_dkng, cause = "ORtg")$Granger,
        error = function(e) NULL
    )

    # Display results
    if (!is.null(granger_dkng)) {
        cat("DKNG Price → {Attendance, ORtg}:\n")
        cat("  F-statistic =", round(granger_dkng$statistic, 3), "\n")
        cat("  p-value =", round(granger_dkng$p.value, 4), "\n")
        cat("  Interpretation:", ifelse(granger_dkng$p.value < 0.05,
                                        "DKNG Price Granger-causes other variables",
                                        "No significant Granger causality"), "\n\n")
    }

    if (!is.null(granger_attend_dkng)) {
        cat("Attendance → {DKNG, ORtg}:\n")
        cat("  F-statistic =", round(granger_attend_dkng$statistic, 3), "\n")
        cat("  p-value =", round(granger_attend_dkng$p.value, 4), "\n")
        cat("  Interpretation:", ifelse(granger_attend_dkng$p.value < 0.05,
                                        "Attendance Granger-causes other variables",
                                        "No significant Granger causality"), "\n\n")
    }

    if (!is.null(granger_ortg_dkng)) {
        cat("ORtg → {DKNG, Attendance}:\n")
        cat("  F-statistic =", round(granger_ortg_dkng$statistic, 3), "\n")
        cat("  p-value =", round(granger_ortg_dkng$p.value, 4), "\n")
        cat("  Interpretation:", ifelse(granger_ortg_dkng$p.value < 0.05,
                                        "ORtg Granger-causes other variables",
                                        "No significant Granger causality"), "\n\n")
    }

    # Summary note
    if (is.null(granger_dkng) && is.null(granger_attend_dkng) && is.null(granger_ortg_dkng)) {
        cat("Note: Granger causality tests unavailable (n =", nrow(var_dkng_data_final), "observations).\n")
    }
}
Note: Granger causality tests unavailable (n = 6 observations).

Impulse Response Functions:

Code
# Check if model exists
if (exists("final_var_dkng") && !is.null(final_var_dkng)) {
    # Track successful plots
    plots_created <- 0

    # IRF: DKNG Price → Attendance
    irf_dkng_attend <- tryCatch(
        irf(final_var_dkng, impulse = "DKNG_Price", response = "Total_Attendance", n.ahead = 5),
        error = function(e) NULL
    )
    if (!is.null(irf_dkng_attend)) {
        plot(irf_dkng_attend, main = "Impulse: DKNG Price → Response: Attendance")
        plots_created <- plots_created + 1
    }

    # IRF: Attendance → DKNG Price
    irf_attend_dkng <- tryCatch(
        irf(final_var_dkng, impulse = "Total_Attendance", response = "DKNG_Price", n.ahead = 5),
        error = function(e) NULL
    )
    if (!is.null(irf_attend_dkng)) {
        plot(irf_attend_dkng, main = "Impulse: Attendance → Response: DKNG Price")
        plots_created <- plots_created + 1
    }

    # IRF: ORtg → DKNG Price
    irf_ortg_dkng <- tryCatch(
        irf(final_var_dkng, impulse = "ORtg", response = "DKNG_Price", n.ahead = 5),
        error = function(e) NULL
    )
    if (!is.null(irf_ortg_dkng)) {
        plot(irf_ortg_dkng, main = "Impulse: ORtg → Response: DKNG Price")
        plots_created <- plots_created + 1
    }

    # IRF: DKNG Price → ORtg (additional relationship)
    irf_dkng_ortg <- tryCatch(
        irf(final_var_dkng, impulse = "DKNG_Price", response = "ORtg", n.ahead = 5),
        error = function(e) NULL
    )
    if (!is.null(irf_dkng_ortg)) {
        plot(irf_dkng_ortg, main = "Impulse: DKNG Price → Response: ORtg")
        plots_created <- plots_created + 1
    }

    # Summary note
    if (plots_created == 0) {
        cat("\nNote: IRF unavailable (n =", nrow(var_dkng_data_final), "observations).\n")
    } else {
        cat("\nGenerated", plots_created, "IRF plot(s). Interpret with caution (n =", nrow(var_dkng_data_final), ").\n")
    }
}

Note: IRF unavailable (n = 6 observations).

This VAR model tests whether NBA performance metrics influence sports betting industry valuations. With DKNG only public since April 2020, we have just 5-6 observations—severely limiting statistical inference. Granger causality tests and IRF analyses lack sufficient degrees of freedom for reliable results.

Despite data constraints, this framework demonstrates the interconnected sports-finance ecosystem: NBA attendance and offensive efficiency potentially drive betting industry valuations. The COVID period introduces extreme outliers that dominate the short sample, making results exploratory rather than definitive. As more data accumulates (2025+), future analyses can test whether these relationships persist beyond the pandemic disruption.

References

1.
Kubatko, J., Oliver, D., Pelton, K. & Rosenbaum, D. T. A starting point for analyzing basketball statistics. Journal of quantitative analysis in sports 3, (2007).
2.
Skinner, B. The problem of shot selection in basketball. PloS one 7, e30776 (2012).
3.
Franks, A., Miller, A., Bornn, L. & Goldsberry, K. Characterizing the spatial structure of defensive skill in professional basketball. (2015).
4.
Sliz, B. A. An investigation of three-point shooting through an analysis of nba player tracking data. arXiv preprint arXiv:1703.07030 (2017).
5.
Poropudas, J. & Halme, T. Dean oliver’s four factors revisited. arXiv preprint arXiv:2305.13032 (2023).